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ABSTRACT 

This  report  treats  the  design  of  discrete  filters  for 
the  detection  of  signals  caused  by  nuclear  explosions  on 
digitized  seismic  recordings.  The  theoretical  aspects  of 
filter  design  a  e  treated,  together  with  the  setting  up  of 
the  necessary  formulas  for  realizing  the  filters  on  digital 
computers.  Specific  discrete  filters  so  treated  are:  (1) 
matched  filter,  (2)modified  matched  filter,  (3)modified 
mauched  filter  for  a  multiparameter  model,  (4)filter  for 
the  elimination  of  trend  components,  (5)time-invariant 
filter,  (6 )time-invariant  filter  in  the  noiseless  case, 
(7)spike  filter,  (8)time-varying  filter,  (9)detection 
filter,  and  (10) squared  magnitude  devices.  The  normal 
equation  forms  in  optimum  filtering  problems  for  the  deter¬ 
mination  of  the  filter  coefficients  and  the  error  are 
developed  for  (l)single  processes,  (2)multi-channel  pro¬ 
cesses,  and  (3)multi-dimensional  processes.  Recursive 
computational  schemes  are  presented  for  normal  equations 
of  Toeplitz  form.  For  single  processes  the  Levinson 
recursion  for  the  extension  of  the  prediction  error  operator 
and  the  extension  of  the  general  filter  is  developed,  as 
well  as  the  recursion  to  move  the  output  origin.  A  corres¬ 
ponding  development  is  given  for  multi-channel  processes, 
as  well  as  a  development  of  the  recursion  to  larger 
operators  for  the  multi-dimensional  processes. 

The  prediction  problem  for  single  stationary  time 
series  is  reviewed  and  the  least  square  and  Kolmogoroff 
solutions  given.  Extension  is  then  made  to  the  multiple 
case,  the  least  squares  equations  set  up  and  the  Wiener- 
Masani  factorization  described.  Heuristic  use  is  made  of 
the  Hilbert  space  property  of  time  series.  A  digital 
computer  program  for  performing  the  Wiener-Masanl  factori¬ 
zation  is  discussed. 


Requests  for  additional  copies  by  Agencies  of  the 
Department  of  Defense,  their  Contractors,  and  other 
Government  Agencies  should  D  J erected  to  the 

ARMED  SERVICES  TECHNICAL  INFORMATION  AGENCY 
ARLINGTON  HALL  STATION 
ARLINGTON  12,  VIRGINIA 

Department  of  Defense  contractors  must  be  estabiishee 
for  ASTIA  services  or  have  their  "need-to  know"  certi¬ 
fied  by  the  cognizant  military  agency  of  their  project 
or  contract. 
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ABSTRACT 

This  report  treats  the  design  of  discrete  filters  for 
tne  detection  of  signals  caused  by  nuclear  explosions  on 
digitized  seismic  recordings.  The  theoretical  aspects  of 
1  liter  design  are  treated,  together  with  the  setting  up  of 
the  necessary  formulas  for  realizing  the  filters  on  digital 
computers.  Specific  discrete  filters  so  treated  are:  (1) 
r etched  filter,  (2)modified  matched  filter,  (3)modifled 
■  'atched  filter  for  a  multiparameter  model,  (4)filter  for 
the  elimination  of  trend  components,  (5)time-invariant 
filter,  (6 ) time-invariant  filter  in  the  noiseless  case, 

(7) spike  filter,  (8)time-varying  filter,  (9)detection 
filter,  and  (10) squared  magnitude  devices.  The  normal 
equation  forms  in  optimum  filtering  problems  for  the  deter¬ 
mination  of  the  filter  coefficients  and  the  error  are 
developed  for  (l) single  processes,  (2)multi-channel  pro¬ 
cesses,  and  (3) multi -dimensional  processes.  Recursive 
computational  schemes  are  presented  for  normal  equations 
of  Toeplitz  form.  For  single  processes  the  Levinson 
recursion  for  the  extension  of  the  prediction  error  operator 
and  the  extension  of  the  general  filter  is  developed,  as 
well  as  the  recursion  to  move  the  output  origin.  A  corres¬ 
ponding  development  is  given  for  multi-channel  processes, 
as  well  as  a  development  of  tne  recirsic  to  larger 
operators  for  the  multi-dimensional  proc  es . 

The  prediction  problem  for  single  stationary  time 
series  is  reviewed  and  the  least  square  and  Kolmogoroff 
solutions  given.  Extension  is  then  made  to  the  multiple 
case,  the  least  squares  equations  set  up  and  the  Wiener- 
Masani  factorization  described.  Heuristic  use  is  made  of 
the  Hilbert  space  property  of  time  series.  A  digital 

computer  program  for  performing  the  Wiener-Masanl  factori¬ 
zation  is  discussed. 
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I  INTRODUCTION 

The  detection  of  weak  signals  from  nuclear  explosions 
on  seismic  recordings  presents  an  important  and  difficult 
problem,  both  from  a  practical  and  theoretical  point  of 
view.  As  a  result  every  effort  should  be  made  to  keep 
theory  and  practice  coordinated  and  in  balance  with  each 
other . 

Much  has  been  written  on  the  design  of  filters  for 
signal  detection,  not  only  in  seismology  but  in  all  branches 
of  science.  The  present  report  is  unique,  however,  in  that 
here  the  theoretical  aspects  of  filter  design  are  treated 
simultaneously  with  the  practical  setting  up  of  the  neces¬ 
sary  formulas  for  realizing  the  filters  on  digital  com¬ 
puters.  Therefore  by  making  use  of  the  material  developed 
in  this  report  one  can  right  away  analyze  seismic  data 
by  use  of  some  of  the  most  advanced  filters  known.  The 
seismic  data  is  required  in  digitized  form,  and  such  data 
is  now  readily  available  to  Vela  Uniform  Projects  from  the 
Vela  Uniform  Data  Center  in  Washington. 

One  of  the  most  important  contributions  of  this  Report 
is  the  development  of  practical  ways  to  design  multiple- 
channel  and  multi -dimensional  filters.  This  Report  repre¬ 
sents  the  first  least-squares  treatment  of  this  problem. 

Also  included  is  the  first  practical  investigation  of  the 
Wiener-Masanl  multiple  spectral  factorization.  Another 
important  contribution  of  this  Report  is  reflected  in  its 
completeness.  Here  one  can  find  detailed  treatment  of 
many  Important  types  of  filters,  some  presented  for  the 
first  time  from  the  digital  point  of  view. 


6 


NOTATION  CONVENTIONS 


In  order  to  preserve  a  general  consistancy  in  the 
notation  used  in  this  report,  we  have  adopted  the  fol¬ 
lowing  conventions. 


1)  Division  of  the  alphabet 

Specific  use  of  the  letters  of  the  alphabet 
are  assigned  in  the  body  of  the  report.  However, 
a  general  division  is 


A  1 

B  >  Transients  (wavelets,  operators, 

C  J  filters,  etc.) 

X  I 

Y  (  (Stationary)  Gnneral  Processes 

2  J 

2)  Use  of  Subscripts  and  Superscripts 

A  discrete  multi -dimensional  process  is 
designated  by 

A/  y.  ( 

*  t 

w here  t-  13  the  time  index 
-Care  space  indices 

and  N  is  open  to  any  particular  interpretation. 


The  dimensionality  of  the  process  is  given  by  the. 
total  number  of  super-  and  sub-scripts  to  the  right  of 
the  letter.  Thus,  the  example  above  has  4  dimensions. 
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The  order  of  a  process  is  given  by  the  number  of 
equivalent  one- dimensional  processes  in  a  multiple  pro¬ 
cess.  Thus,  it  is  given  by  the  product  of  the  maximum 
values  assumed  by  the  superscripts. 

3)  Upper  and  Lowei  case;  Script  and  Non-script  letters 
Unless  otherwise  defined,  the  following  conven¬ 
tions  for  upper  and  lower  case,  and  script  and  non¬ 
script  letters  will  be  used: 


Single 

Multiple 

Process  1 

_ _ _ 1 

Process 

Time 

HI 

n 

Domalr 

n 

ift  siC 

Frequency 

Domain 

X('>, 

Y(») 

,  r  m  ^  ri 
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2.  Discrete  Filters  for  Digital  Data 


2.1 

Hatched  Filter 

9 

2.2 

Modified  Matched  Filter 

15 

2.3 

Modified  Matched  Filter  for  a 
Multiparameter  Model 

22 

2  .4 

Elimination  of  the  trend 

components 

25 

2.5 

Time-invariant  filter 

32 

2  .6 

Time-invariant  filter  in 
noiseless  case 

the 

36 

2.7 

Spike  Filter 

37 

2.8 

Time-varying  filters 

38 

2.9 

Detection  filter 

45 

2.10 

Computational  aids 

58 

2.11 

Squared-magnitude  devices 

61 

2 .12 

References 

63 
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DISCRETE  FILTERS  FOR  DIGITAL  DATA 


1 .  Matched  filter 


Assumptions: 

(a)  The  signal  sfc  has  a  known  fixed  shape. 

(b)  The  noise  n^  is  a  normal  random  process. 

(c)  The  mean  value  E-^n^  is  known  to  be  equal 
to  zero. 

(d)  The  covariance  <^>tr  =  E  {ntnr}  the  noise 
is  known. 

(e)  Time  t  is  a  discrete,  integer-valued  parameter. 

(f)  The  observed  random  process  is  xfc. 

(g)  The  time  period  under  observation  is  t  a  1,2,...,N 

Problem: 

We  wish  to  test  whether  hypothesis  HQ  is  true  or 
hypothesis  H1  is  true,  where 

Hr  v5t  +  /n‘ 

Matrix  notation: 


X  = 

(x^,x^,  •  •  •  ,x^j)  ; 

lxN 

row 

vector 

n  = 

(n^  in2»  •  •  • 

lxN 

row 

vector 

s  =» 

(s1,s2,. ..,sN)  ; 

lxN 

row 

vector 

* 


<P  mm,  (f) 

”«  hi  r  if) 


•  •  I  *  •  • 

K  hi  K 


NxN  covariance  matrix  of 
the  noise 

(assumed  to  be  non-singula 


r) 
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det  <f>  :  determinant  of  the  covariance  matrix 

^  ^  :  inver8e  of  the  covariance  matrix  <j> 

A  prime  indicates  matrix  transpose. 


Probability  density 


Under  hypothesis  HQ,  x  is  normally  distributed  with 
mean  zero  and  covariance  matrix  4>  ;  that  is,  the  probability 

density  fQ(x)  is 


/ 

Mr*)* 


Under  hypothesis  ,  x  is  normally  distributed  with  mean  s  and 
covariance  matrix  ;  that  is,  the  probability  density  f^(x) 


is 


/ 


%  [-  j  ] 


Likelihood  ratio 

The  likelihood  ratio  (x)  is  formed  by  taking  the 
quotient  of  these  two  density  functions,  that  is 


k(o£)’- 


-  -j/ts-  yi* '-*•  s /is  - 


=  cxp\_-i  [- 
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(since  jpts'-S/*-*') .  The  observer  will  choose  hypothesis 

H  when 
o 

XU)  <  XQ 

where  X0  is  a  constant  determined  by  the  decision  criterion 
used.  Setting  A  (x)  -  X0  ,  and  taking  logarithms,  we  obtain 

sus' }  s  by  A. 

Thus  the  inequality  7t  (x)  <1,  is  the  same  as  the  inequality 

S/ts'}  <  ^  X0 

or,  what  i3  the  same  thing, 

.XyLts  <  by A0  4  j[ 


Decision  rule: 

Setting  0  -  log  Vl,  +-£  yub  >  we  have  the  following 
decision  rule: 

Choose  H0  (that  is,  say  that  the  signal  is  not 
present)  if 

s'  <  Q. 

Choose  H1  (that  is,  say  that  the  signal  is  present) 
if 

■XyUS'  >  6. 


That  is,  the  decision  is  based  on  the  test  statistic 

V  N 


'  l \  1/ 
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computed  from  the  observed  process 


and  the  known  signal 

^  ~  (  5i  >  ) 


Hence  we  may  say  that  the  observed  process  x  Is  compared  with, 
or  matched  to,  the  signal  s.  For  this  reason,  the  filter  which 
performs  the  computation  /yes'  is  called  a  matched  filter. 

Distribution  of  the  test  statistic; 

Since  the  test  statistic  x/j.5  is  a  linear  combination 
of  normal  random  variables  x,  it  follows  that  itself  is 

normally  distributed,  with  mean  and  variance  as  follows: 

Under  (i.e.  x  =  n): 

£{-Xu.S  }  -  0  <Lorvct  0, 

Wys 7-  £{(** S'/j=  E{ty*y yii'} 

£  (  SytL'lL  as'  } 

=  s/i'  £{4*}yus'  -  s/t^s' 

=  S/ts' 

<^>  -  E{*iW J t 
-  I  -  tAl  } 

/'/• 
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Under  Hx  (i.e.  x  =  s  +  n): 

E[*JUiE$  ~  E{*\ A*  '  jEust  £  {*j  -  S . 

y  y  ^ 

var c->uS‘  (the  same  as  the  variance  under 
H  ,  because  the  variance  of  a  random  variable 
doesn't  depend  upon  its  mean). 


Relaxation  of  the  normality  assumption 

Let  us  now  drop  assumption  (b),  that  is,  we  no  longer 
assume  that  nfc  is  a  normal  random  process,  but  instead,  we 
assume  that:  (b-^  nfc  is  a  random  process  with  unknown  distri¬ 
bution  . 

We  now  consider  a  filter  with  coefficients  given  by 


d, 


CL 


7V  J 


Nxl  column  vector 


The  input  to  this  filter  is  the  observed  random  process  x  and 
the  output  is  xa,  that  is: 


The  mean-square  output  is  defined  to  be 


£  {(*a.f)  -  £{(*d.)''X<i}  -  £  {cl'**  llJ 


-  d!  £{*'%}  . 
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Under  hypothesis  HQ  (x  =  n,  or  no  signal  present),  the  mean- 
square  output  is 


4.'  l 


Under  hypothesis  Hj  (x  -  s  +  n,  or  signal  present),  the  mean- 
square  output  is 


E  {(/td.)2  j  -  CL '  E  { (s  *-/)i  j  j  CL 


&  £[s's}a.  -h  cl  '  E  {s  'suj  c.  /  <L  a.*  a'Efcjnja. 


s's  (Z  V-  CL  <£>  a_ 

since 


E {K}  =  o  ,  E \<n/}  *  0  >  EfaUt  J  -  <E> 


We  wish  to  determine  *  such  that  the  ratio 


Mean-square  output  under  H-. 
Mean-square  output  under 


Cs's<l+<C'Pcl  ^  g_ysa_ 
a."  4>cl 


is  a  maximum,  or  equivantly  such  that  the  ratio 


4  5  SOL 

CL  <f>  CL 


is  a  maximum. 
Hence  we  require 


PA. 

da. 


(cl' cl)  j$'s  a_  -  (d  <t> a.)  cl  s's  a. 


=  O 


(&'  4>  a.)z 
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i 


or  . 

/  ,  (<X'S$<L\ 

SLSSa  -  *  o 

or 

s'sa  -  (pa.J-  o 

The  solution  is  / 

CL  =  X  S 

The  output  of  the  filter  is  / 

'pa.  -  s 

which  is  the  same  as  the  test  statistic  obtained  under 
the  assumption  that  n  was  normally  distributed. 

2  .  Modified  matched  filter 
Assumptions 

V/c  now  wish  to  modify  assumption  (a)  given  in 
section  1  as  follows: 

(a1)  The  signal  st  is  given  by 

st  =  c  ft 


where  c  is  an  unknown  constant  and  ffc  is  a  known 
fixed  function. 

Problem 

y/e  wish  to  test  whether  hypothesis  HQ  or  hypothesis 
is  true,  where 


h„: 

H,  ;  %  -  C  ^  t  /nt 


(c  -  o) 
(c.*  o) 
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Matrix  notation 

f  a  (f^,  f2>  •••  »  :  *xN  row  vector 

s  =  cf 

Likelihood  ratio 

The  likelihood  ratio  is 


=  c.  />/'] 

The  observer  will  choose  Hq  when  the  likelihood  ratio  A 
satisfied  A<  A0  for  some  fixed  threshold  A„ 
Because  c  is  unknown,  we  consider 


'WWA.C*)  <  Ae 


instead  of  A^-)<A#  .  The  maximum  occurs  when  the 

negative  of  the  exponent  of  A  (/%)  is  a  minimum,  that 
is,  when 
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We  have 

=  0'~ /jt/L-f  ' -h  c-fyu4'  -o 


or 


Hence 

/mt  AM  =  cxf>[cy/u -f'-  JL  /a^7] 

-o »<C<o° 


Taking  logarithms,  we  have 


Decision  rule 


Letting  G  =»  2  log  A0  , 


we  obtain  the  decision  rule: 
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Choose  Hq  (that 
not  present)  if 

CuJT 

Choose  (that 
present)  if 


is,  say  that  the  signal  is 


< 

is,  say  that  the  signal  is 


>  0. 


That  is,  the  decision  is  based  on  the  test  statistic 


('t/if 'f 
fyUf' 


computed  from  the  observed  process 

and  the  known 

We  see  that  the  test  statistic  is  a  quadratic  function  of  the 
observations  x  -  (x1,x2* . . .  ,Xjj)  . 

Maximum  likelihood  estimates 


Let  us  now  find  the  maximum  likelihood  estimates  of  c 
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and  s.  We  recall  that  the  density  function  of  x,  under  the 
normality  assumption,  is 


/ _ 

(At*)'*- 


fa-si 


udm  S-  ci. 


The  maximum- likelihood  estimate  of  s,  denoted  by  s,  is  that 
value  of  s  »  cf  for  which  f(x)  is  a  maximum.  The  maximum 
occurs  when  the  negative  of  the  exponent  of  f(x)  is  a  minimum 
that  is,  when 


P-  (' £-5 )u  (tf-s) 


minimum, 


subject  to  the  constraint  that  s  =  cf ,  We  shall  now  use  the 
method  of  Lagrange  multipliers  to  solve  this  constrained 
minimization  problem.  We  therefore  introduce  the  undetermined 


multipliers 


Nxl  column  vector. 


We  now  wish  to  minimize 


(s-cf^A  t  (s~  ) 


with  respect  to  s,  A  ,  and  c.  We  thus  obtain: 


21 -0  : 

-  (#r  ^  =  ^ 

dS 

&  s  0 

a-c-f  .© 

T--0 

9c 

W-0  . 
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Solving  these  equations,  we  have 

X  -  (■#- 

$'f's  U-iftjLi'  =0  , 

or 

=o  . 


Hence 


where  a  is  the  maximum-likelihood  estimate  of  s. 

Let  us  now  find  the  maximum-likelihood  estimate  of  c. 

the  above  reasoning,  it  is  that  value  of  c  obtained  by  requir 
lng 


P=  fa- 50  fa -  5)  “  fa-  t-f)u.fa-  c{)' ~  minimum 


We  thus  have 


at 

dC 


O  ■■  -oc-ifyr  .0  10' , 


which  is  the  same  c  as  found  above,  as  we  would  expect. 
Miscellaneous  notps 


By 


We  recall  that 
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f'  (i  .  4,  ) 


(non-singular  matrix) 


c  -  /^a£. 

■f*f'  ■ 

/  ,  f  / 

Njw  it  may  be  shown  that  the  quadratic  form  /*  at  is 
indicates  determinant): 


4  <t>u  ••• 

4> 

/A 

f, 

0 

Ift 

4> 

MV 

/JyLL  i'  ~  ~ 

4/  4*  *•* 

•  0  • 

<£> 

AH 

•  •  • 

4 

• 

4> 

MU 

^  >  •  ■ 
MU 

A> 

§  a  • 

4/  •  ■  ' 

*,  4  •• 

*hrH 

/, 

<7 

M 

4k  - 

<!> 

MW 

Similarly 

h 

,  k  ... 

1  *yz 

4 

J> 

4*  - 

K 

-s 

it 

l 

4  4*  ••• 

... 

4* 

4 

• 

£ 
M. J 

Ai- 

4v 
•  •  • 

'Wa/ 

4 

£ 

4**1. " 

MV/V 

^  ^  - 

?h 

0 

Hence 

A  6*.  ” 

.  ^ 
M/V 

4 

h 

n  i 

/i 

A  4>JL^ ' 

C  m  *— — 

= 

» •  • 

4/  4*  *  - 

•  •  • 

% 

• 

t> 

MV 

0  • 

*  •  * 

A/V  ^ 

^ t  •  • 

■  *„  0 

4 

4  •• 

1  & 

*  1 

22 


3.  Modified  matched  filter  for  a  multiparameter  model 
Assumptions 

Instead  of  assumption  (a^ )  given  at  the  beginning 
of  Section  2,  let  us  introduce  assumption 


The  signal  sfc 


is  given  by 


where  CyC.^, . . .  ,cp  are  unknown  constants  and  flt>f2t *  ’  '  *  ,fpt 
are  known  fixed  functions. 


Problem 

We  wish  to  test: 


ft  : 

'■ 


4- 


/.  C-  -fy  t-Af  ( ,J*7KJL  fit  <*/£  'tAu 
C/>  C2->' 


Matrix  notation: 


c  - 

Cc„ 

• 

/< 

c 

A 

1  L 

*•  * 

K  f 

•  •  1 

y»/v 

5  -  C  *; 


■f 


Likelihood  ratio 

The  likelihood  ratio  is 


/  A  ^  ylfiio  .  !,*'-■£  t&O 


t 


X  /J  .  )KA  &</ 


AU)~  €Xp[  ~  x 


23 


The  maximum  of  A  (x)  with  respect  to  all  values  of  c  occurs 
when 

dyjLp  -  *  0  >  or 

t*  **v(un' 

_ y  y _ 

This  maximum  is 

/nuttAfo)  -  expl^f'c'-  j  ifyti-f'c' 

-  O*  <C  icfi 

-  exP  '  wf 

-**F[i  (fyf 

where  we  have  vaed  [(^tf )"']- and/M.'*/*’ 


Decision  rule: 

As  before,  we  let  G  -  2  log  A0  .  We  thus  obtain  the 


Maximum- likelihood  estimates 

As  before,  we  find  the  maximum-likelihood  estimates  of 


c  and  8  are 
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Let  us  now  find  the  expected  value  of  c.  It  is 

E{^f'  (fyf)"}  -  Ety/Lt' 

=  Cf/lf'fyf)''  =  C  _ 

Hence  E  {c t ■  c  bo  c  18  an  unbiased  estimate  of  c.  Also 

£{$}=  HZ*}  ’  ^  ’  5 

/\ 

so  s  is  an  unbiased  estimate  of  *. 

The  covariance  matrix  of  c  is 

t-£{u-c)'(z- o}  » 

But  under  both  hypothesis  HQ  and  hypothesis  H,,  we  have 


Hence 

if =  fr] } 

“  /a  fuf')'1  j 

~  (^A^')  '  fa  f'  (fa-f'T  (mm*-  4yc=l) 
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Hence  c  has  mean  c  and  covariance  matrix  1 . 

The  covariance  matrix  of  s  is 

E{  (s-s)' (s-s)}  = 

=  f '(*/.?)"  i/  ^  f'( WVf 

=  n^rf-f 

Hence  s  =  cf  has  mean  s  «cf  and  covariance  matrix  /  ^  . 


Miscellaneous  notes 

A 

We  note  that  the  maximum  likelihood  estimates  c  and 
"s  are  Independent  of  the  scale  of  the  covariance  matrix.  That 
is,  suppose 

<t>  =  <J*y  a* x/  n-  <f>~'  -  where  ^  is  known 

but  the  scale  factor  v  is  unknown.  Then 

&-  'xy-'t'df-'E'V  ,  5- 

independent  of  0’"*'  .  Thus  assumption  (d)  may  be  so  relaxed. 

4 .  Elimination  of  the  trend  components 
Assumptions 

As  a  preliminary  step  in  the  analysis  of  random  processes, 
one  might  try  various  methods  to  eliminate  as  well  as  possible 
any  trend  components.  One  approach  to  this  problem  is  to  make 
use  of  the  models  discussed  in  the  foregoing  sections,  with  the 
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following  changes  In  terminology. 

(1)  sfc.  Instead  of  being  called  the  signal.  Is  now 
called  the  trend  component. 

(<?)  n^..  Instead  of  being  called  the  noise.  Is  now 

called  the  trend-free  component. 

(3)  As  before,  the  observed  random  process  Is  x 

t 

We  shall  make  assumptions  (a2),  stated  In  Section  3,  and  (b), 

(c),  (d),  (e),  (f),  (g),  stated  In  Section  1. 

Problem 

Given  that  xfc  -  s*.  +  nt,  estimate  the  trend  component 

V 

Maximum  likelihood  estimates 

As  a  solution  to  the  problem,  the  maximum  likelihood 
estimates  of  c  and  s  may  be  used.  They  were  given  at  the  end 
of  Section  3,  and  we  recall  that  they  are: 

C  =  U/Lp)  (iAVf 

s  --  c  /  =  'V>  /"J V 

where  ,  ,  ■  le  the  covariance  matrix  of  the  trend-free  component 
nt,  and  where 

Also  we  recall  that  we  may  relax  assumption  (d)  to 

(d1)  The  covariance  Is  equal  to 

<j>  -  O'* 

tA. 

where  ^  is  known  and  (F Z  is  an  unknown  scale  factor. 
Then  we  let  ,  and  hence^t*  1  .  Substltutlng^.^^" 
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into  the  expressions  for  c  and  s  we  obtain 

c=  (WrY 
s  =  vy'ftif-'f')'1-? 

which  shows  that  the  maximum  likelihood  estimates  are  independent 
of  the  unknown  scale  factor,  j-*1  . 

Relaxation  of  the  normality  assumption 
Instead  of  (b)  we  now  assume: 

(b^)  nfc  is  a  random  process  with  an  unknown  distribution. 

Least-squares  estimate 
We  have 

/K  ~  C-/  +  />i 

and  we  wish  to  estimate  c.  The  least-squares  estimate  c  is 
that  value  of  c  such  that  the  sum  of  squared  errors 

\J  ~  C-f)C%-C, \)  -  minimum. 

We  have 

f \-o  ■■ 

or 

cff-  */' 

Hence  the  least-squares  estimate  is 


c  =  stf'm  0"' 
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To  compute  c  we  see  that  we  do  not  need  assumption  (d);  that 
is,  we  do  not  need  to  know  the  covariance  matrix  of  nt  in 
order  to  obtain  the  least-squares  estimate  c.  This  fact  makes 
the  least-squares  estimate  extremely  useful  in  practice. 

The  least-squax’es  estimate  is  unbiased  since 

r{cjj= 

(and  using  -  cf) 

£{<?[-.  cf-r(ff)"  ‘  ■ 

V 

The  covariance  matrix  of  c  is  defined  to  be 

f^'0'Cc-0] 

Since 

-  (L<  fWffpy1 

--  Afxfpy*-  cfreffy 

-  f'(fO"  *  <■ 

we  have 

c-c  - 

Hence  the  covariance  matrix  of  c  is 

(c  -  c)'Cc  -c)]  -  Efc'Kf'WJfafYffy"]} 

■  f/’  ^  r  fff  rj 

which  is 


_ f  e{^  }  { Yff  O'1 


29 


Best  unbiased  linear  estimate 

We  wish  to  find  an  estimate  c  which 

(1)  is  linear  in  the  observations,  i.e. 

C  *■  JLJt 

where 

^  [ 4'  4  4* 

(2)  is  an  unbiased  estimate,  i.e. 

£{cj  *  C 

(3)  is  best  in  the  sense  that  the  covariance  matrix 

of  c  is  "less  than"  the  covariance  matrix  of  any  other 
linear  unbiased  estimate  'c,  i.e. 

E{(l-t)'a-c)}  <  E{(c-c)'(t-c)} 

(More  precisely,  the  "less  than"  sign  <  as  used 
here  means  that 

-  E{(Z-c)'( c-c)} 

is  a  positive  definite  matrix.) 

Using  (1)  we  have 

C  -  (s  t,h)  =  (<.+  +  srO  J. 

~  C  4  Jr~  +  yTL  Y- 

Uslng  (2)  we  have 

C  '  {  C  }  -  £  £  C  'fJ-\  -h  E 


--  c  tJ-+  eI  qj- 

-  C  { 2-  AisHtJL 

Thus  we  have  the  constraint 

■f/'  *  I  x  identity  matrix. 

Also  we  have  c-c=nb  so  the  covariance  matrix  of  'c  is 

£{(c-c)'(c -c)j  =  E  {  UJ-)} 

-  />i! /n.  ■£■}  -  J-'  E  (rnW}  Jr  *  J-'  •ir . 

Hence  b  may  be  determined  as  follows: 

Minimize  the  pxp  matrix  bV  b 
subject  to  the  constraint  fb  -  I. 


We  may  use  the  method  of  Lagrange  multipliers.  Introduce  the 
pxp  matrix  A  as  an  undetermined  multiplier.  We  then  wish 
to  minimize 

J-  J  2  'A  (-fJ-  -j) 

with  respect  to  b  and  x  .  We  have 

2JL  -  Q  :  Jr  '  4>  *  A  f  =  o 

&  o~ 

2  -q  :  +2=1 

Solving  these  equations  for  b  and  A  we  have 

Thus  2=  -  . 

Pj^-J  gives  -  '-f  4>  2  2  -J  j 


Also 
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or 

r-  -  a  $,ry' 

Hence 

j.~ -f  ry  ~  f'(-f  f’i')'1 . 


We  recall  the  notation  for  1  was 

ytt.*'  4>" 


Hence 


Thus  the  best  unbiased  linear  estimate  c  of  c  is 


which,  we  see,  is  the  same  as  the  maximum-likelihood  esti¬ 
mate  obtained  under  the  normality  assumption  (b).  The  co- 
variance  matrix  of  c  is 

£{Y£-c)Vc-c)j  =  $  Jr  ~  7 

which  is 
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5.  Time-Invariant  filter 


3.1 


Assumptions: 

(a)  The  signal  s^  has  a  known  fixed  shape. 

(b)  The  noise  n^.  is  a  random  process  with  an  un¬ 
known  distribution. 

(c)  The  mean  value  E  |ntj  of  the  noise  is  known 
to  be  equal  to  zero. 

(d)  The  covariance  4>tr  =  E  |ntn  j  of  the  roise 
is  known. 

le)  Time  t  is  a  discrete,  integer-valued  para¬ 
meter. 

(f)  The  observed  random  process  is  =  s  +  n 

t  t  t 

(g)  The  random  process  xt  is  observed  for  t  =  1, 
2,  ...,  N 


(h) 

(1) 


The  observed  random  process  x,.  x„,  ...,  x„, 

.  .  .  .  -^linear  N 

is  passed  into  a  time-invariant  filter  with 

coefficients  ‘  * /^M  ^to  be  determineci) . 

The  actual  output  of  the  filter  is 


ft  =  ft  *t-s  (*’ 

(J)  The  desired  output  of  the  filter  is  Z  (t  = 

v 

0,  1,  2,  ...  M+N),  where  Z ^  is  a  known  fixed 
function . 


5 .2 Problem: 

We  wish  to  determine  those  values  of  the  coeffl- 
clents  ^0 *  1  *  •  •  •  >  such  that  the  mean  of  the 

sum  of  squared-errors  between  the  desired  output 
Zt  and  the  actual  output  yt  is  a  minimum,  that  is, 
such  that 


MinimOTtv 
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5.3  Matrix  notation: 

We  define  the  (N  +  M  +l)  x  (M  +  1)  matrices 

[X  0  O  1 


0 

? 

; 

# 

(5  0 

0  o 

\  0  •  *  • 
5/  s0 

1 - 

o 

5/V  V' 

?  % 

• 

o  0 
^  0 

• 

1 _ 

■y«,  o 

m 

O 

0 

**-1 

f 

; 

0  0 

0  0 

/ft 

A'-i 

Thus 

x=  s^/v 
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Let  y ,  Z,  and  e  be  the  (N  +  M  +  1)  x  1  column  vec¬ 
tors 


and  let  6  be  the  (M  +  1)  x  1  column  vector 


w 

t  : 

fa 

Thus 

7"  V 
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Because  X-  S+ A/  wL  £{A/}  *Oy  ElA/'}*0) 


we  have 


Hence 


'r‘  £{  z'z-Zz'X/!  +fX‘Xf  } 

'  E(zz -  Z-z'(S+N)f  +p'(S*N)’(S*N)  p} 

~j{z'z-iz-S/S  4p's'sp+f s'N'Np} 

Zz'2z'^  +fsvf  +  p'Ei**}fi 


§Z=0  ■  -Zz'S  *  ip'S'S  +  i.fl{n'n}  *o 

or 


or 


52  =  S'sfi  +  E{ A/>j  £ 
p  =  f {/V/Vj  ]  >9  2 
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6 .  Time-Invariant  filter  In  the  noiseless  case 
6.1  Assumptions 

The  assumptions  are  the  same  as  those 
given  in  Subsection  5.1>  except  that  now  It 
is  assumed  that  there  is  no  noise  (i.e.  n^.  = 
0).  In  other  words,  assumptions  (b),  (c), 
(d)  are  omitted,  and  (f)  becomes 

(f,)  The  observed  process  xfc  is  the  fixed 
function  s. ,  i.e.  ;  s. . 


6.2 


6.3 


Problem 

We  wish  to  find  that  value  p  of  j  for  which 
the  sum  of  squared  errors  between  the  desired 
output  z.  and  the  actual  output  y  is  a  minimum, 
that  is,  such  that 
H'V 

^ zt  ~  (z  ~'Y)  ~  MhuTnVPl. 

Determination  of 

Prom  the  result  of  the  last  Section  (Subsec¬ 
tion  5.4)  the  desired  ^3  is 


/»  -  [3'S ]Vz.  . 
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7 •  Spike  filter 

7 . 1  Problem 

The  spike  filter  is  a  specialization  of  the  time- 
invariant  filter.  The  spike  filter  is  designed 
such  that  with  the  signal  as  input  it  will  pro¬ 
duce  little  or  no  output  while  the  signal  is  en¬ 
tering  the  filter,  a  large  positive  spike  when 
the  signal  has  fully  entered  the  filter,  and  lit¬ 
tle  or  no  output  thereafter.  The  spike  filter  is 
also  designed  to  have  little  output  when  noise  is 
its  only  input. 

Suppose  we  want  the  spike  to  occur  at  time  t  -  tQ. 
Then  we  let  the  desired  output  be 

r  i  t-  Ca 

2t  '  l  0  t  t0 

or  in  matrix  notation 

'  0  ' 

0 

L~-  o 

I 

0 

.0  . 

where  the  one  occurs  in  the  position  of  • 

*0 

7.2  Solution 

With  this  value  of  the  desired  spike  filter 
is 

£>  ~  [S  B  S  Z 


8.  Time-varying  filters 
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Consider  the  diagram: 


Ideal: 


Actual: 


We  let  T  be  the  present  time.  We  assume: 
(1)  The  signal  is  of  the  form 

st*  ,vt  *  Jr,  CA  tit 


where  is  a  random  process  with 
and  known  autocovariance  ^>vv(t,s) 
where 


C-,  >  Cq  1  ...  c 

it:  m 

are  unknown  but  fixed  constants,  and 

f  f  f 

llt'  1  2t'  •••’  mt 

are  known,  non-random  functions  of  time  t. 

(2)  Thi  desired  output  at  time  T  is 

cP 

ZT "  \  h 

£  *  ”«0 

where  the  coefficients  kfc  (  -  °©  <  t  <.  00  )  of  the 
ideal  filter  are  known.  For  example,  in  case 
of  prediction  ot  units  ahead,  k.  =  Sm, 

u  rr  OC } 


0  #/><* 
I  t=Tt« 
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Qfi 


SO 


_  2  s 


s 

T+oL,t  t 


5,  =  5. 


Ti-oC 


Aft 

yUT+°i  +  j&,QA 

(3)  The  actual  output  is  given  by  (at  time  T) 


y  -  I 

Jr 


or 


where  the  coefficients  of  the  finite  filter  a^, 
a2>  ...  aT  are  to  be  determined  so  that 


ZT"/^T)Z]  =  minimum 


under  the  constraint  that  the  actual  output 

equals  the  desired  output  when  =  sfc  and 

for  all  possible  constants  C, ,  ...,  6  . 

id  m 

This  constraint  may  be  written 

2t=  h  *-t  sf 


«  T 

^  X  s  =  1  a  % 

t"<io  f  1  t  c 


Letting  c:  -  1,  cg  »  0,  ...  =  0  we  obtain  sfc  =  flt,  and 

so 

T 


,?4  4  ■  ,5  *.  4 . 


Similarly,  we  obtain 


2  ^  \  fit  Jtc. 
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Hence  the  constraint  is 
to  7- 

y*  ~  J;  a *  A/ 

(4)  The  actual  input  Is 

where  the  noise  nfc  is  a  random  process  with  zero 
mean  E  =  0  and  known  autocovariance 

and  known  cross-covariance  with  v. : 


End  of  Assumptions 


Let  us  now  transform  to  matrix  notation.  Write: 


1XT 

row 

vectors 

1  x  ■•0 
row 

vectors 


1 


s  » 
n  - 

X  a 

rj ■ 

a  a 

V  a 

V  a 

0  - 
k  a 


(Sj,  >  •  ••>  S  rp ) 

(^1 1  n2 >  •  •  •  >  ^>p) 

(Xx #  Xg >  • . • >  Xp) 

'  Jl'  f  J2’  f JT^ 

(ap  i  &2  >  •  •  •  >  aip) 

(vl<  v2»  •  •  • »  v>p) 


f  =[  :  \:a  mxT  matrix 


’/r*l 


C  a 


(C1  * 


cm) 


F  = 


V.PJ 


{•••»  v_iJ  vqj  v i <  •••) 

(  •  •  •  >  B  Q  >  •  J  •  >  •  ) 

(  •••>  ^0 *  *  •••) 

'  ^  fJrl'  fJ9'  fJl'  ■••J 


P. 


1  /m 


:  an  m  x  oo  matrix 
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Then:  s  a  v  +  cf 


s 

-  V  + 

cF 

eT  = 

Sk1 

1 

yT 

=>  xa' 

Fk' 

a  fa' 

(the 

constraint) 

E{ 

(zT  -  y 

T>2} 

zs 

E  j  (Sk' 

-  xa'  )2 

} 

E(s) 

=  cf 

V 

ix}  .  E 

H  - 

E(S) 

»  cF 

x  = 

s  +  n 

e] 

cf 

yT 

a  Xa  '  a 

(s+n) 

a' 

E^t] 

a 

■W 

a '  . 

a  cfa" 

£"^because 

!  Of 

cons 

traint 

ZT  = 

»  Sk' 

e! 

lZt]  ' 

E  ^s}  k' 

a 

cFk' 

•A — 

’Let  us  now  determine  the  optimum  operator  a  -  (a1>  a  ). 

The  mean-square-e  rror  is 


where 


~  e{zt}~'z£{zt'k*}+  £\ 
z  i<  '  z  i*  «•'  + 

i 

tif  Eh'*)  :  TxJ 

iThe  constraint  is  Fk'  .  fa1.  Let  ^  -  (^  ,  ...,^  ) 

be  Lagrangian  multipliers.  Hence  we  minimize  (with 
respect  to  a): 

J-  fJ']  . 


Thus 


or 


5J  = 

da. 


-3.  <t>  +  3  a.  <£>  -  2  Af 

*'£  /t/f. 


=  0 


(1) 


*  A*  “  «-  *?; 


Necessary  and  sufficient  condition  for  the  con¬ 
strained  minimum. 

Now:  Let  u  =»  v  +  n,  so  that  x=u+cf,  E(u)=0. 

Hence 

E(kU)~  £({<*.'+  T  C  ‘)(tl  +  Ci)j  =  E{u'u}+  ic'd 


or 


(2) 


faf'c'cf  tcb.lL  <t>u ^  *A 


M  -tv 


Now 


^  JC-  E(zj'Z)  -  E  ([SA']*)  -  E{[(ymcE)A  ']/yj 

Maiu,  £( k)-‘ 0 }  E (V)  *  0 

"  £  {a  (y  4  F'c')(u.c  c/)J  =  A.E(v'u)4  Jt  F'c'c-f 


or 


(3) 


Eq.  (1)  IS  If  +  a 

Z* 


The  constrains  is  Pk'  =>  fa'  or 
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Subst  (2)  and  (3)  gives 


■\{>  A  £<*«.•)+ A  F'i'cf 

1  f  <-  A  £(Vu)  =■  a.tm  * 


Thus 


=  a.  tUUL  +  d  fc  'cl  3  ai 

(df'-ArJc'c-f 

•m  Tit  . 


(4) 


2f  ^  ^  £(vu)  -  a ^u. 


Now  define  p  and  q  to  be 


f  /  /«. 

AE{Vu}*  £^uu 

Then  (4)  becomes 


( ^  •  ^?xa7  /*u,tu+  )  >  f  fyu. 

\$  ■■  ixt  !»«•■  ««&•>,  y- M {vtffc!, 

+  f  &u.~  a  ^U. 


or 

<5)  rv  *  ?• *- 

This  is  the  desired  solution  except  that  we  must  de¬ 
termine  'X  .  To  do  so  we  substitute  a  -  q  +  X  p  into  the 
constraint  kF1  =»  af1  .  We  have 

Af'  *  (fit-  *f){' 

which  is 

^  ^a/;  **  JlF‘  ~  p  f 

or 

a-  Ur'-f)(f'V 

Hence  the  desired  filter  is 


44 


By  letting  T  vary,  we  thus  obtain  the  optimum  time-varying 
filter  a  as  a  function  of  T. 
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9.  Detection  Filter 

Consider  the  following  case: 


Ideal:  ^Signal 


■aideal  filteij 


Desired  outputj 


Actual: 


[Signal  + 
\random  nois 


Finite  filter 


Actual  outputj 


(1)  The  Ideal  System: 

VJe  assume  that  the  signal  has  the  form 


i  - 1 


v/  here 


are  p  known  functions  of  time  t  (t  an  integer)  and  where 

S  j  *■  )  >  + 

are  p  unknown  constants.  The  ideal  filter  is  an  infinite* 
non-reallzable,  time-varying  filter  described  by  known 
coefficients  f^Tt  .  The  desired  output  is 

z.  «  5  kt  st  =  \  V  cvf^ 


"t  ' '  Tt 

T=  -  o o 


=  i  --  i  c‘ 

itf  L~' 

CO 

where  ^>t  ^nowr1, 

7 '--co 
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We  are  interested  in  the  desired  output  zt  over  the  time 
interval  t»l,2,...,N,  and  so  we  let  the  1XN  row  vector  z  denote 

Z  ”  C  3  >  •'* 

Also  we  let  g  be  the  known  p*N  matrix 


and  let  c  be  the  unknown  lip  row  vector 

c  *  ( c '  C ,  •  ••  C  ) 

Then,  clearly,  the  desired  output  over  the  time  interval  of 
Interest  is 


where  c  is  unknown  and  g  is  known. 

(2)  The  Actual  System 

We  assume  that  the  actual  input  is 

where  a^.  is  the  signal,  as  given  above,  and  n^.  is  random  noise 
with  zero  mean,  i.e. 

E{^t\  =  o 

where  e|--  J  denotes  ensemble  average. 


We  assume  we  know  the  actual  Input  x^  over  the  time  interval 
t»-Mfl , -Mt2 ,...,-l,0,l,2,...,N,  and  so  we  let  the  I  X  (MfN)  row 
vector  x  be 


4  = 


st 

'M+Z 


*  •  • 


/£  y)L  /it  . 

»  >  /  J  *4  > 


We  are  interested  in  the  desired  output  over  the  time 
interval  t-l,2,...,N,  and  so  it  follows  that  we  are  interested 
in  the  actual  output  yfc  over  the  same  time  interval.  Hence 
we  let  the  1VN  row  vector  y  be 

Let  the  Nx(MfN)  matrix 


QL  ~ 


1 

\-yii-2 

•  • •  . 
/V 

a. 

X M 

n. 

y}-M  +  i 

a. , 

a. 

/V / 

a 

...  (L. 

/V-V 

denote  the  finite,  time-varying  filter,  where  a  is  such  that 
input  x  and  output  y  are  related  by 


Note  Prime  indicates 
matrix  transpose. 
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Since  //£  *  \  *  >U£  >  we  write 

,¥■  -  $  V-  ^ 


where 


^ -MH  3  -/itz>  '  *j  S-/’  ^o’^/  iS«i> 


*"*  ^ 

-/*>/  »  '  -//^i  J  •*  .^0  > •  •• 


We  recall  that 


£< 


/  4  <  ^  c’  (C'-C*,-,C>). 


A 


Hence  define  the  p*(MfN)  matrix  f  to  be 


or 


■f 


Then 


■f 

V^' 

r 

t/,-a hi  ’ 

••  .  fu 

»  > 

...  / 

)  7/(a/ 

/ 

t, 

/ 

7  3.N 

■f 

fr**' 

f' 

UJ 

lvJjA£  C  ~ 

tt* 

-HU, -Hu, 

5  =  C  -f 

t  t  \ 

l*(h+N)  (l*jt>)  ft  (Mhj) 
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The  actual  output  y  is  therefore 

/£  aj  *  (s+ At)  aJ  =  So.'  +  alo-  -  q^cl 

which  has  ensemble  average 

£{yj  -  +E(ai<)l}  = 

since  cfa'  is  a  constant,  and 

£U}  -  o . 

In  summary: 

The  actual  output  is  y  **  cfa'  +  na'  . 

The  desired  output  is  z  -  eg. 

The  ensemble  average  of  the  actual  output  is  E  \y\ 


Definition  The  actual  output  y  is  said  to  be  unbiased 
provided  that  its  ensemble  average  E{y\  is  equal  to  the 
desired  output  z  regardless  of  the  value  of  the  weighting 
factor  c,  i.e.  provided 


cfa'  =*  eg  regardless  of  c. 


It  therefore  follows  that  y  is  unbiased  if  and  only  if 


fa'  -  g 


■This  is  called  the  unbiasedness  constraint 


/ 

/KCL 

) 


=•  cfa'  . 
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Schematically,  we  have: 
Infinite  time  series: 


of  white  noise 


Let  us  first  consider  the  special  case  in  which  the 
random  noise  nfc  is  white,  i.e. 

r  )  ( 4  t:S 

■*  ^  o  aV++Mr^*i. 

where  O'*  is  the  common  variance.  Since  xfc  =  nt+3t  with  = 

we  have 

-  sriT  S  sumA K  E  =  S  —  C  f 

where 

Thus  we  have  MfN  observations 

'P  - 

concerning  which  we  make  the  following  assumptions: 


51 


(1)  Their  ensemble  averages  are  linear  combinations 
of  the  p  unknown  parameters  C*Cc, ,  ...  5  C^,)  Thus 

^  C^Li  (t*-M+!y  •,  N ) 

or 

Efai’  c 

where  the  matrix  f  of  order  p"k(M+-N)  is  known  and  is 
termed  the  design  matrix,  and  where  the  row  vector 
c  is  unknown . 

(2)  Their  covariance  matrix  is  the  product  of  an 

unknown  Bcalar  Q~2  and  a  known  positive  definite 
matrix  "J  .  Thus 

‘  E{(*t  ‘  %)} = 
or 

=  E{('£- s)'  fa- s) j  -  E{ska t  j 


(3)  In  the  special  case  now  under  discussion,  4>~  \  “ 

Identity  matrix,  i.e.  each  observation  xt  has  the 

Barae  variance  CT2,  and  every  pair  x.  ,x  is  uncorrelated 

w  r 

(Vr). 

(4)  We  assume  the  rank  of  f  is  p,  and  that  p  <  MtN. 

According  to  the  principle  of  least-squares,  we  estimate 
C,  jCj.,  •  •  •  <  simultaneously  by  selecting  those  functions 
C(,CA)  °f  "M*  which  minimize 

j'  f  U-  icJ.J  =  c*-c ou<i1 

j.  ^  t  /./  1  1 

t  --Mil  C 
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with  respect  to  c^Cg* . . . ,Cp  considered  as  independent  variables. 
We  have,  by  differentiating  J  with  respect  to  c  and  equating 
to  zero. 


SI 

dC 


=  £(4.-c-{)(--f)  =  0 

cH‘= 


or 


which  are  the  normal  equations,  or  equations  of  estimation  for 
the  parameter  c.  Since  f  has  rank  p,  there  is  no  non-null 
vector  v  such  that  -fa  C  and  therefore  ~/\r at 

positive,  so  ff1  is  positive  definite.  Hence  the  normal 
equations  have  a  unique  solution. 


Since 

£J 

dc 

we  have 

II 


-2.($  r-  c  -f-f') 
2  41 


That  is,  the  matrix  of  aecond-order  differential  coefficients 
of  J  with  respect  to  c  is  2ff',  which  is  positive  definite,  so 
J  has  an  absolute  minimum  when  c  c  M  fa  4 ')( f  4 /)~' 

Now  let  us  consider  the  ensemble  average  of  c.  We  have 

£{«*?-  efafxffrh  £{*} -pW1 

But  Efo}-  S-  C-f  .  Hence 

-cft'wr1  -  c . 

Hence  the  estimate  c  is  unbiased,  in  the  sense  that 


e{*U  <• 
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The  covariance  matrix  of  c  is 


ctr(c,?)  ,  rtf  W) ') 

-  e{[«- sj 

-  #*>]  f'Ctf'V 

--rl((  f’Y'ff'M)-1 

-  crz(frj[ 


Tiie  least-squareB  estimate  c  of  c  1b  a  linear  function 
cf  the  observations  x,  i.e. 


G 


,)L  ^  (  f  f  0  j  E  wAuiUl  dsj/^l^  $  ($  f  7 


Consider  any  other  linear  estimate  of  c,  say 


=  L 


Because 


( L  u 

richt  x  -j*-  ) 


E{*L j--  £f*R  =  c/£  , 

we  see  that  '£  is  unbiased  for  all  c  provided  that 

f L~  I 

The  covariance  matrix  of  c  is  . 

C*V'itL)'%L}  =  L  unrl^L 

-  <yaL'  L  . 
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We  now  wish  to  show  that  the  least-squares  estimate  'c 
Is  better  than  any  other  unbiased  linear  estimate  o'  in  the 
sense  that,  for  each  parameter  Cj, 


/>» 


/i 1*4.  C. 


V 


z 


c  • 

J 


Proof  We  have 


2  Xa  -  XA  -  XL  ■*  L'L 


We  recall  that 


Hence 


Also 


Also 


X-  fC-ff'T1  w  dz  4  L-i. 

L'A  -  L‘  44447'  =  1  *  (44'V 

XL  =  (Ztf  -  [((( -  [(ff')T  - 

i'a--  (frr-f -r(fry'--  mv 


© 

( ff)'1  ® 
© 


Using  @,(g),(^,  we  have 

x\+  (l-  a)'(l-a)  -■  W)'‘-  (4(7'- 


Hence 

c r*  /.'L  -~ 
t*v(Z£)  - 


-  /LY. 

C*v'( c ,0)  -t  (X^Ci'  A)  ^-3) 


(tiV  +  L'L 


Each  diagonal  element  of  cov(c\ c)  is  therefore  minimized  if 
the  corresponding  column  of  L-  A  consists  entirely  of  zeros. 
Hence  c  is  the  best  linear  unbiased  estimate  of  c. 

QED 
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(4)  Unbiased  estimation  of  linear  combinations  of  the 
weighting  factor  c  In  the  case  of  white  noise 

We  now  wish  to  consider  unbiased  linear  estimates 
of  the  desired  output  z«cg,  where  c  is  the  unknown  weighting 
factor  and  g  is  a  known  pXN  matrix  (see  page  ).  Suppose 
that  2  ■  xi  is  an  unbiased  estimate  of  Z.  That  is, 

cj=  E&u  ' 

whatever  c .  Then 


The  covariance  matrix  of  Z  is 

Cfnr(z2)  '  /*.£)  -  X ' -  Cr^Jt'X . 

-  ^  jj  where  c  is 
the  best  unbiased  linear  estimate  of  c.  ItB  covariance  matrix 
is 


A  A 

Consider  now  the  estimate  2-  -  C.  ? 


a>v-(2,z)  =  Cn-(ijt  cevCc, 


ff" 


; 


Now 

4  (i-yu-ij)  =zuj) 

m*i(tt'ri  -tsvfrj  -rm'u 

’Jtu.  1'  7 

Therefore 


'a  y 


***■*■  * 
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or 

*»(?£)*  c«r(St2)+  <rxtt-*j)'(tej) 

Each  diagonal  element  of  cov(z,z)  is  always  less  than  or  equal 
to  the  corresponding  diagonal  element  of  cov(z^z),  which  shows 
that 


*x*-*f‘  ((ft 

is  the  best  unbiased  linear  estimate  of  the  desired  output 

z=  c;. 

(5)  Unbiased  linear  estimation  of  the  weighting  factor  c 
in  the  case  of  colored  noise. 

We  now  drop  the  assumption  that  the  noise  is  white, 
that  is,  we  drop  assumption  (3)  on  page 

Let  T  denote  the  lower  triangle  matrix  such  that 
4>  =  T'  T  and  define  K--,/.!'1. 

Then 

£■«.»  £{*T"l*  E(*} r'=  c4  T' 

and 

(T"Y  j"  *  (T'VVVt*' 

=  (T)"  o-zT'  IT  "  “  oJL&-,),T'  T7~' 
»  o-UTT")'  TT‘  =  <r% 


BO  tl=  ALT’* 


z'- 

Z 


where 


/V 

C 


ie  white  noise. 
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Since  T  ^  is  non-singular,  fT  ^  has  the  same  rank  as  f,  which 
is  p.  Using  the  results  on  page  ,  we  see  that  the  best 
unbiased  linear  estimate  of  c  is 


2  -  u.i(fr')'{(fr')(fT")T] 

=  *r'[('T'7f,/ifr/T"'  f 

=  *(r"r')  v{  aT"T"'){'f 

=  *  (T'lV  f  {  i  ,  n 

c  =  ,V-  <£'  f  {£  4> '  f')  ' 


The  covariance  matrix  of  c  is 

-  o-a  [  f  Tr"  f'T  , 

Cw  ("c  :c)  - 

Tbe  estimate  c  is  the  value  of  c  at  which  the  quadratic  form 

c*-  c  f)  4>"  fo-cty 

attains  its  minimum. 

(6)  Unbiased  linear  estimation  of  linear  combinations  of  the 
weighting  factor  c  in  the  case  of  colored  noise 


The  best  unbiased  linear  estimate  of  z  - 


is 
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with  covariance  matrix 

^-(z',£)=  crVf/7"  « 


(7)  Determination  of  the  finite  filter  a'  subject  to  the 
unbiased  condition 

We  wish  to  find  a'  such  that  E(z-y)'(z-y)  is  a  minimum 
subject  to  .  We  have 

J-  E{(z-y)(z-y)J  =  -  c  ia.'~  -  c  to.-,;  i  a.')} 

-  E{(ha.')Y it  a.')  J  =  cl  £ }  cl  -  <yz  x 

Introduce  the  Lagrange  multipliers^3  ( "/\n  }*)  and  minimize 
J  -*  2~k( y  -  i  &')  .  Setting  the  derivative  of  this  equal 
to  zero  we  obtain  ,\f  .  Solving  the  simultaneous 

equations  for  a  and  ^  : 

cl 

■fa-'  -  <J 

we  obtain 

«.=  fUCCj’Cf'  a-- 

Hence  the  actual  output  is 

fc'f '( ft"'!  'Vf  ~  Z  *  6  / 

10.  Computational  Aides 


faw) 


as  given  in  subsec¬ 
tion  (6)  above. 


Suppose  we  wish  to  divide  the  z-transform 
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A  \  _  .  '»L,  a  7  L.  a 

(^9  •  tlcZ  +  “■,  Z-  +  ^JL*-  ■*■  /»>-> 


)t  L^Z 


sttl'A 


-*>L 


of  the  finite  operator  (d-c  t  j*stn  by  the  z-transform 

I—.  sU.  .  ,•*/  7^  -/  ■  •  *  4-  Z  +  ‘K-i 

F(z)  --  d6z  +  <*,£  +  ^ 

of  the  finite  operator  c^,  £X_...  (><a  We  have 

40  >  J  'Vi 


TJU) 

Rzj 


a  /*n-/n  j 

A  *  ^A,z 


Then 


A  = 


*  A2  * 


,  sW.-sV.Vl. 

A.z.  +  • 

A 


A, 


A.* 


<y- 


a1 


0 


ex 


/.v 


<*0 

*0 

*, 

a/ 

1 

<*o 

0 

<* c 

a. 

/ 

A, 

4* 

<*% 

0 

(9 

^  0 

d2 

os 

F 

o£ 

i 

<v 

0 

(9  i 

OS 

dc 

0  , 

<*, 

1 

o'* 

c/,  0 

df. 

*L< 

'  “2. 


and  in  general 
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where 


A, 


o/„ 


A*  i 


o^0  0  0 

(/,  -K  0 
A,  A  < 


0 

2 


a. 

t 

n. 


^  ^ 

indicates  determinent. 


i(*  e  ->".‘'4 
be  '»\*--<-C( 


Example  1:  Divide  5Zb  +  3z‘J  +  Z4  +  2Z3  +  3Z2  +  6Z  +  7 
by  2Z  +  3Z3  +  4Z2  +  JjZ  +  7.  These  coefficients  are 
3i  1 >  2,  3,  6,  7)  and  (2,  3*  5>  7)-  The 

quotient  is 


A0  z.~f  4,  £  +  /4  x 


where 


A  =  .1 
' "o  J, 

,  ^-  = 

i  | 

H  | 

2  51 

33  >  ' 

2 

3 

0  5 
z  3 

°r  /4„  -  JE 

-Z  J 

A5 

-1 

1 

) 

l 

4- 

3  1 
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11 .  Squared-magnitude  devices 

11.1  Assumptions 

(a)  The  finite  time-series 

xQ,  x1>  . ..,  xN 

represents  (N+-1)  consecutive  observations 
from  a  regular  stationary  stochastic  pro¬ 
cess  with  zero  mean  and  with  autocovariance 

and  spectral  density  <£>  (u>) . 

(b)  The  infinitely  long  time-series 

•  •  •  >  0  »  0  i  #-q  >  >  •  •  •  »  sfy; !  0  ,  ... 

formed  by  letting  ^  =•  0  for  t  <  0  and  for 
t  <  N  is  the  input  to  system  with  impulse 
response  bt>  The  impulse  response  bfc  may 
be  infinitely  long  in  both  directions,  i.e., 
bt  may  be  different  from  zero  for  both  t-^°° 
and  t  -go  . 

(c)  The  output  time-series  yfc,  which  in  general 
will  be  infinitely  long  in  both  time  direc¬ 
tions,  is 

o O 

(d)  The  output  y.  is  the  input  to  a  squared-mag- 

z  2 
nltude  device  whose  output  is  y.  . 

2  “ 

(e)  Finally,  y,  is  the  input  to  a  device  that 

2  ^ 

suma  y  from  t  =*  -  00  to  t  =oo  and  then 
t 
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divides  the  result  by  N*1  .  Thus  the  out¬ 
put,  denoted  by  W  >  is 


Schematically,  we  have 
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65 

3.  Normal  Equation  Forms  In  Optimum  Filtering  Problems 

In  section  2  of  this  report  the  formal  equations  for 
several  types  of  filters  (matched,  time -invariant  least 
square,  time  varying,  etc.)  were  developed.  In  sections 
3  and  4,  we  will  concentrate  on  the  time -invariant  least- 
square  filter.  However,  we  will  generalize  this  case  to 
include  both  multi-input,  and  multi-dimensional  processes. 

Section  3  develops  the  normal  equations  for  single, 
multi-input,  and  multi-dimensional  process  optimum  filters. 
Highly  efficient  computational  schemes  for  solving  these 
equations  are  given  in  section  4. 


The  transient  autocorrelation  of  a  complex  single 
process  is  standardly  written  as 


where  x  Indicates  complex  conjugate. 

If  we  think  of  x  as  a  matrix  valued  process  (real  or 
complex)  and  X  as  the  transpose  of  X,  then  the  auto¬ 
correlation  of  x  can  have  the  same  formal  definition. 
This  convention  will  be  used  throughout  section  3.  and  4. 
in  order  to  preserve  the  formal  similarity  of  the  differ¬ 
ent  systems.  The  reader  will  note  that  this  alters  the 
form  of  the  single  process  development  as  was  given  in 
section  2  . 


3.1  Single  Process 

The  single  process  is  characterized  by  one  set  of 
numbers  corresponding  to  discrete  intervals  of  time. 
Thus,  the  series 


St  *  I.  ,  .5,  .2 5,  ./«, 
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corresponds  to  the  signal 


Sit) 


t  </ 


In  this  section  we  will  determine  the  optimum  least-square 
filter  for  operating  on  such  a  process. 


3.1.1  Assumptions 

a)  The  signal  5*  has  a  known  fixed  shape. 

b)  The  noise  is  a  random  process  with  unknown 

distribution  . 

c)  The  mean  value  E  { ftt]  of  the  noise  is  known 

to  be  zero. 

d)  The  auto-covariance  of  the  noise  E  Tij  j  i3 

known . 

e)  Time  is  a  discrete  integer  valued  parameter. 

f)  The  observed  random  process  is 

xt"  7lt  . 

g)  The  random  process  is  observed  for  t»  0,  /,  2  ...  V. 

h)  The  observed  random  process  in  convolved  with 

the  coefficient* of  the  impulse  response  of 
a  linear  filter  A/ <  A/  (to  be  de¬ 

termined)  . 

i)  The  actual  output  of  the  filter  is 

i,  xt.s  t.h-l, 

J)  The  desired  output  of  the  filter  is  N 

where  Zf  is  a  known  fixed  function. 


3.1.2  Statement  of  Problem 

De termlne  i  such  that  the  sum  of  the  errors  squared 
is  a  minimum: 


E 


— "  in  1  rrum 


(3.1-1) 
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X  •  5  +  N 


(3.1-3) 

(3.1-4) 

(3.1-5) 

(3.1-6) 

(3.1-7) 

(3.1-8) 


(3.1-9) 
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Thus 

fX  *  y 

3.1.4  Determination  of  the  filter  4  . 

Let  ^  be  the  expected  value  of  the  sum  of  the 
squared  errors 


(3.1-10) 


e  transpose  (3.1-11) 


Now,  if  we  take  the  derivative  of  0C  with  respect 
to  ~f  and  equate  it  to  zero  In  order  to  minimize  d-  , 
we  find 


jf  -  0  E{(«-f/)x}  -  0  (3.1-13) 

0r  £  {  C  X  j  =  0  (3.1-14) 

Thus,  the  condition  that  oC.  be  minimized  is  equiva¬ 
lent  to  saying  that  the  error  C  must  be  normal  to  the 
process  x  .  For  this  reason,  this  equation  is  known  as 
the  normal  equation  for  f  . 

Now  expand  the  terms  in  the  nornal  equation: 


But, 

and 


E{  eX  -  f  X  X }  *  o. 
x  ■  S  +  N 

E  {n}  *  0  , 


(3.1-15) 
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Thus 

£{  *(W^j  -  f(S  +  A/)(Wjj  =  0  (3.1-16) 

■£  f  5  5  +  £  f  /V/V'J  J  -  2  S  (3.1-17) 

If  we  examine  the  multiplication  S  5  In  detail 


we  see  that  the  multiplication  of  the  first  row  of  5  by 
the  first  column  of  fo  ,  the  second  row  by  the  second  col¬ 
umn,  etc.,  is  lixe  the  0  lag  of  a  transient  auto¬ 
correlation  of  a  portion  of  S  .  Since  each  multipli¬ 
cation  is  taken  over  different  limits,  the  terms  along  a 
diagonal  of  SS  will  not  be  the  same. 

For  ease  in  computation,  we  desire  that  the  term 
55  *£  [a/  N~j  be  Toeplitz,  i.e.,  that  the  elements  along 
each  diagonal  be  the  same.  This  can  be  accomplished  in 
two  ways: 

1)  I£N»fJi  and  S  is  a  stationary  random  process, 
then  we  make  the  approximation  that 

n  . 

7^77  L  S;v  =  E  {*•••,•  S',1  (3.1,19) 

L'  n-i 

The  normal  equation  becomes 

f  j^E[55  +/VVj  r  £^?5j  (3.1-20) 

or 

=  I 


(  R 
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y'  1  L  /-  /•  •••  ] 

(3.1-23) 

H  •  [  £«  Z**n  •  /  J 

(3.1-24) 

C  -  [  ^ o  '  C»iM- l  ] 

(3.1-25) 

The  normal  equation  now  becomes 

S’  S'  t  e{a /rvj  j  =  E'  S' 

(3.1-26) 

or  {  F{  =  g 

(3.1-27) 

where  £  =-  S'  S'  +  €  {  H  H  } 

(3.1-28) 

S  £  *'  B‘ 

(3.1-29) 

r 

K,  . 

r  y 
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(3.1-31) 


L  9-  9»  • 


(3.1-32) 


3  -  L  3*  ^ 

3.1.5  Determination  of  the  expected  error  . 

From  equation  (3. 1-11 )  we  defined  &(.  to  be 

ai  --  E  (  e  e  }  (3-1-lla) 

If  we  substitute  q  z  g  -  ^  X  >  we  find 

«  -  E  {  e  (i-Wj  }  (3-1-33) 

But,  since  e  is  normal  to  *  and  f  is  a  linear  op¬ 
erator,  we  get 


*  *  E  { 


e  z 


E{(t-  (K)  ?  ) 


-  E  (  ?  z  -  ( *  i) 


Z  2  -  f  Cj 


(3.1-3**) 


since  £  f  ♦'i  J  *  0  . 

3.1.6  Prediction 

A  special  case  of  interest  is  that  of  predicting  future 
values  of  a  series  from  past  values.  For  this  case  we  set 
the  desired  output  to  be  the  signal  at  some  future  time: 


2  J  S, 


(3.1-36) 


z,  ,  ,  ,  2  *  J  ~  [ 
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where  k  is  the  prediction  distance.  The  normal  equation 
becomes 


f[  SV  *  £{vv]j  -  s,  5 

Now,  if  there  is  no  noise,  this  takes  on  the  form: 

If  fi  •  •  •  fi] 


(3.1-36) 


r, 

K,  r. 


*  *  f  /  I 


#  # 

.  k-. 


1 .  * 


*  M  ] 


(3.1-37) 


3.1.7  Prediction  Error 

In  many  cases  we  are  more  interested  in  the  error  in¬ 
volved  in  predicting  rather  than  the  actual  prediction. 
Equation  (3.1-8)  defined  the  error  to  be  the  difference 
between  the  desired  output  and  the  actual  output.  This 
can  be  written  as 


e  »  z  -  f  X 


(3.1-38) 


since  we  have  assumed  that  there  is  no  noise  involved.  If 
we  define 


(3.1-39) 

and  expand  5  suitably,  then  the  error  can  be  written 

e  ./'S 


(3.1-40) 
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3.2  Multi-Input  Processes 

The  multi-input  process  is  characterized  by  having 
several  separate  time  series.  These  time  series  are  ar¬ 
ranged  so  that  each  point  in  time  is  represented  by  a 
column  matrix  where  each  term  in  the  matrix  corresponds 
to  a  specific  time  series.  This  leads  to  the  idea  of  a 
matrix-valued  time  series  for  which  we  wish  to  find  a  mat¬ 
rix-valued  least-square  optimum  filter. 


3.2.1  Assumptions 

a)  The  n*l  matrix-valued  signal  St  has  a  known 
fixed  shape 


S*  = 


si 


5; 


b)  The  "x/  nBtrix  valued  noise  *1,  is  a  random 
process  with  unknown  distribution. 

c)  The  mean  value  of  the  noise  E  is  known  to 
be  zero . 

d)  The  covariance  of  the  noise  }  is  known. 

e)  Time  is  a  discrete  integer  valued  parameter. 

f)  The  observed  random  process  is 

xt  --  5,  *  n, 

g)  The  random  process  is  observed  for  t  ■  0,  I,  ■  •  •  1 

h)  The  observed  random  process  is  convolved  with  a 


X  x  n  *  *1 

matrix-valued  linear  filter 

f,  L  v 

(to  be  determined)  . 

i) 

The  X*  1  /in 

matrix-valued  actual  output 

of 

the  filter  is 

M 

Y*  1 

21  ft  x,.s  t  •-  N 

J) 

The  J fxl  X  i  n 

matrix-valued  desired  output 

is 

?r  t  s  n-!,n  , 

.  ,  .,  N 

where  2*  is 

a  known  fixed  function. 
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Usj.ng  these  assumptions  we  can  now  determine  the  fil¬ 
ter  f  by  following  the  development  for  single  processes 
(3. 1.3-3. 1.7) .  Each  term  of  the  matrices  that  are  defined 
there  will  now  be  a  matrix  rather  than  a  scalar.  This 
is  essentially  only  an  interior  .grouping  of  terms  within 
the  matrix.  If  this  grouping  is  removed ,  leaving  the  in¬ 
dividual  scalar  terms  arranged  as  they  were,  the  matrix  will 
have  a  normal  configuration  and  interpretation. 


The  normal  equation  is 


] 


(3.2-1) 


where  the  expected  error  is 

^  :  ?  Z  -  /  J 

(3.2-2) 

How  r;  is  a  n  *  *  matrix  which  contains  all  terms  of 
the  (  s  lag  of  the  autocorrelations  and  crosscorrelations 
of  the  input  series.  (Note  that  ^ *  r.  ).  Likewise  a, 

Isa  X m  n  matrix . 

The  restriction  in  the  assumption  3.2-  h)  ,  c  )  an  J  j) 
that  follows  from  the  fact  that  we  can  make 

only  linearly  Independent  combination  of  the  inputs. 

0.3  Multi -Dimensional  Processes 

A  multi-dimensional  process  will  be  characterized 
by^a  multi-dimensional  data  array,  in  two  dimensions  a 
array  might  have  the  form 
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(VJe  make  use  of  script  symbols  in  order  to  emphasize  the 
multi-dimensionality  of  the  process.) 

For  some  applications,  one  of  these  directions  may 
be  thought  of  as  time.  With  no  loss  of  generality,  we  v/ill 
suppress  this  interpretation  for  the  development. 

3.3.1  Terminology 

a)  Dot  Product 

*•  7] 


b)  Displaced  Dot  Product 

il y*'"""'"  (3.3-2) 

t, 

c)  Reversed  Dot  Product 


L 


<.  1 


*  —  t, ..  i. 


(3.3-1) 


(3.3-3) 
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d)  Convolution 


-N  ‘-J 


(3.3-4) 


3.3.2  A 


a) 

b) 


c) 

d) 
c) 
f) 


o  / 


h) 


i) 


sumptions 

The  signal 

.d  -  OO  C  < 

has  a  known 

fixed  shape 

• 

The  noise 

V  <  i‘,  <  - 

is  a  random 

process  with  unlcnown  distribution. 

The  mean  value  £  f  Vi*'  tw|  of  the  noise  is  known 
to  be  equal  to  zero. 

The  covariance  £  [v\'‘  n J‘  "’)  of  the  noise  is 

known . 

The  dimensions  are  discrete,  integer-valued 
parameters . 

The  observed  random  process  is 


«.  l«v 


-  V 


4»  t» 


is  con- 


The  observed  random  process  K 
volved  with  a  linear  filter  with  coefficients 
^  I  i  i(  ‘  ^  (to  be  determined) . 

The  actual  output  of  the  filter  is 


1 


I-  ■  J" 


/  '•  '«  j.-i,  •  j.-t»  1 

{  •*  J 


The  desired  output  of  the  filter  is  ^ 
where  ^  is  a  known  fixed  function. 
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3.3*3  Statement  of  Problem 


We  wish  to  determine  those 
I  i  t\  s  imj,  ouch  that 
squared-differences  between  the 
actual  output  ^  Is  a  minimum; 


values  of  the  coefficients 
the  mean  of  the  sum  of  the 
desired  output  and  the 
that  is,  such  that 


=  minimum 


(3.3-5) 


We  define  this  difference  to  bo 


(3.3-6) 


3.3  Solution 

We  wish  to  fii.il  values  for  £  |  s  i',  4  w, 

that  minimizes  o{  . 


Takinc  the  derivative,  we  find 


(3.3-8) 


Thus,  we  see  that  this  minimum  criterion  implies  that  the 
error  £  is  normal  to  the  process  X  when  dotted  with 
it  over  the  dimensions  for  which  is  to  be  defined. 

For  this  reason,  this  equation  (3*3-8)  is  known  as  the 
normal  equation  for  /  . 


I 
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We  can  expand  this  equation  to  find 


But  since 


*  ;  A.  *  ~n 

and 


(3.3-9) 


E  fAJ  ■'  *  >  E{}}  2  7  -  €  H  ;  0 


v/e  have 


(3.3-10) 


or,  if  v/e  let 


Si 


M" 


< 


J* 


•b 


VI 


) 


(3.3-11) 


and 


3 


/• 


J" 


(3.3-12) 


Then  v/e  have 


1  Jx  - 


(3.3-13) 
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3.3.5  Error  Estimation 

We  can  also  find  a  value  for  <5>(  • 


<X 


l  H  • 


(: 


But,  from  the  normality  condition,  this  is 


cX  = 


e  f  j  -  r "  I 
[>•?]  -  it 


M  -  [rv 


c 


.3-14) 


•3-15) 
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4.  Recursive  Schemes  for  Normal  Equations  cf  the  Toeplltz 
Form. 

The  solution  of  the  least-square  optimum  filtering 
problem  as  shown  in  section  3*  involves  solving  a  set 
of  simultaneous  equations.  In  general,  there  will  be 
one  equation  for  each  coefficient  in  the  filter.  Ihe 
time  and  space  requirements  of  a  standard  simultaneous 
equation  computer  routine  for  such  filters  is  prohibitive 
for  almost  all  non-trlvlal  problems.  This  section  out¬ 
lines  several  more  efficient  schemes  for  arriving  at  the 
desired  filter. 

These  schemes  take  advantage  of  the  special  forms 
of  the  autocorrelation  matrix  /?..  In  the  single  and 
multi-input  cases,  R  has  the  form 


r. 

r, 

r,  . 

''V, 

r, 

r. 

r  . . 

r  j 

r,  .  . 

•  y'n.  I 

K-„.( 

C,.,  . 

.  r. 

where  r,-  -•  r.(  .  In  the  multi-dimensional  case  we  could 

formulate  a  multi-dimensional  matrix  that  would  have  the 
same  property.  That  is,  in  each  case,  all  terms  along 
each  diagonal  are  the  same.  Thus,  given  the  top  row 
(  and  possibly  the  left  column  if  rM  ?  /  (  r, )  )  the  mat¬ 
rix  is  fully  specified. 

The  recursive  technique  Involves  beginning  with  a 
filter  of  length  1  ,  using  this  to  find  a  filter  of 

length  2,  etc.  At  each  step  of  the  process,  we  first 
find  a  vector  (the  prediction  error  operator)  of  length 
m  that  is  normal  to  each  row  of  the  matrix.  In  the 
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multiple  cases  we  must  find  more. than  one  such  operator. 
This  operator  is  then  used  to  Increase  the  length  of 
the  filter  and  of  the  operator  itself. 

Once  the  prediction  error  operator  for  an  auto¬ 
correlation  matrix  R  and  the  optimum  filter  for  a  par¬ 
ticular  ^  L$>  ^  Inhomogeneous  part  of  the 

equation  have  been  found,  the  optimum  filter  for  which 
the  inhomogeneous  part  is  shifted  by  one  (to,  say, 

^  ‘  )  can  be  determined  by  a  recursive  step. 

The  principal  advantages  of  using  the  recursive  tech¬ 
niques  are  time  and  space  savings.  The  standard  solu¬ 
tion  of  simultaneous  equations  requires  time  proportional 
to  n  and  space  proportional  to  n  The  recursive 
technique  reduces  these  requirements  to  «  1  and  n 
for  time  and  Bpace,  respectively. 

An  Important  side  benefit  of  uBing  thlB  scheme  1b 
that  we  can  compute  the  expected  error  at  each  Btep  of 
the  process.  This  allows  us  to  formulate  a  criterion 
for  determining  the  length  of  the  filter.  Ab  the  filter 
becomes  longer,  the  expected  error  will  decrease  and 
then  level  off  at  Borne  value.  The  shifting  of  the  de¬ 
sired  output  (the  inhomogeneous  part  of  the  equations) 
may  also  produce  a  decrease  in  expected  error. 

The  development  of  tne  recursive  scheme  for  slngle- 
and  multi-input  cases  will  be  made  using  two  different 
notations.  The  first  notation  (labeled  Expanded  Nota¬ 
tion)  will  be  the  matrix  notation  used  in  section  3. 

The  second  notation  (labeled  Compact  Notation)  will  in¬ 
volve  the  use  of  a  set  of  vector  operators.  Thus,  we 
will  define  the  vectors: 

a)  r  1  (r. ,  r,  .. ,  rH  )  autocorrelation 

b)  ■-  ("q,  ."«?„)  prediction  error  operator 

c)  "b  r  ("b,  ./'b*  )  hindsight  error  operator 

d)  V  :  Tf,  optimum  filter 
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(The  prediction  error  operator  and  hindsight  error  opera¬ 
tors  will  be  defined  later  in  this  section.  The  super¬ 
script  to  the  left  of  the  symbol  indicates  the  length 
of  the  vector  as  defined  above.  Thu3  ''V ,  * c\  ,  and  *b 
are  of  length  ♦ 1  and  is  of  length  m  ).  V.'e 

also  define  the  operators 

a)  Reversing  ft  ( *4  )  ~ 

b)  Zero  (increases  the  length  by  one) 

zr«)  --  ("*. . n«„,o) 

c)  Sliding  D'(”a  )  -  (  n, ,  n, ,  K,'.„ 

d)  Inner  Product  =  Dot  Product 

["a.v]  •'  "a.  r.  * 

The  recursive  scheme  for  the  single  process  was 
first  formulated  by  N.  Levinson  (1950).  S.  M.  Simpson, 
Jr.,  proposed  the  recursion  for  shift  of  the  desired 
output.  E.  A.  Robinson  extenden  the  single  process  to  the 
multi-input  case.  Finally,  R.  A.  V.’iggins  extended  the 
single  process  to  the  multi -dimensional  case. 


84 


4.1  Single  Processes 

The  normal  equations  for  a  single  process  (see  sec¬ 
tion  3*1)  optimum  filter  are  of  the  form 


where  r,  ,  f,  and  are  scalars.  Associated  with 
this  equation  is  the  equation  of  the  unit  distance  pre¬ 
diction  error  operator  n  (see  section  3*1.7). 


where  C*0  -  !  and  o(  is  the  expected  error  (see 
section  3.1.5).  We  note  that  the  hindsight  error  op¬ 
erator  n  b  ’■  [_  "  ,  .  •  ,  ”  b.  ]  '■  I  (i.e. ,  the  operator 

which  "predicts"  past  values  of  the  series  from  future 
values)  is  Just  the  reverse  of  c\  since  H.  is  symmetric. 

4.1.1  The  Levinson  Recursion  to  Larger  Operators 

This  development  is  a  modified  version  of  that  given 
by  Levinson.  He  uses  the  orthogonal  operator  C  (the 
prediction  operator)  instead  of  the  prediction  error 
operator . 
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4. 1.1.1  Extension  of  the  Prediction  Error  Operator 


Expanded  Notation 

The  scheme  for  extending  " to  is  to  first 

make  an  approximation  of  M‘A  by  a  where  "  -  0  , 

We  substitute  this  new  vector  into  the  equations  and 
examine  the  solution.  Then,  by  adding  a  similarly  ex- 

n*  i  ~ 

tended  hindsight  operator  to  3  we  can  get  the 
real  solution. 


Thus,  first  we  extend  ” 
right  end 


by  adding  a  zero  to  the 


r 


I 


-  [*«,  *  .  •  o,  <*„' ] 


(4.1-3) 


where  <5<J  :  a,  r„,t  r  ••  +  r, 


Since  H  is  symmetric,  the  hindsight  error  operator  is 
Just  the  reverse  of  .  Thus,  the  extended  hindsight 
operator  is  the  reverse  of  S  .  Thus  if  we  weight  and 
add  «  reversed  to  $  and  substitute  we  find 


i 


+  kt o  ..  Oto(*+  k. <*, 


(4.1-4) 


Now,  if  we  choose  kn  such  that 


(4.1-5) 
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Then  we  have  formed  a  new 


Mr) 

a 


Note  that  the  new  expected  error  la 


/I  +  I  -  M  ^  \  0(  M 


Compact  Notation 

The  formula  corresponding  to  (4.1-5)  Is 


<*n 


to  (4.1-6)  is 

;  ?("*)  ♦  *.  K(iC«)} 

and  to  (4.1-7)  is 


["*,  D'-'Cr)]  . 


(4.1-6) 


(4.1-7) 


(4.1-5a) 


(4  .l-6a ) 


.l-7a ) 
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4. 1.1. 2  Extension  of  General  Filter 


Expanded  Notation 

We  now  use  the  prediction  error  operator  H a  to 
extend  the  length  of  the  filter  ^  .  Here,  as  In 

section  4. 1.1.1  we  make  a  first  approximation  to  f  by 
adding  a  zero  to  the  end  of  f  to  form  f 


Cr,,..,x,o] 


k, 


-'  [  S-.  • 


(4.1-8) 


V.  r; 


where  f  «- 

If  we  weight  and  add  reverse  to 


we  get 


[r, 


r,  .  k; 


^  .  r. 


-  ‘  J  n  1  ^  ^  ‘ 


(4.1-9) 


Now,  if 


k:  -  J), 


Then  the  new  filter  is 


Compact  Notation 

The  formulae  corresponding  to  (4.1-10)  is 

-  [  n£  ,  D  "( fV  )  J 

<=<  n 


h 

and  to  (4.1-11)  is 


""f  -  z('(  )<  h ) 


(4.1-10) 


(4.1-10a) 

(4.1-lla) 
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4.1.2  Recursion  to  Move  Output  Origin 

We  wish  to  find  a  filter  that  satisfies  the 

normal  equation 


"  r '  h  -  ? ' 

where  3  ’  •  [  ..  g*.,  J 

the  equation 

'(  r'  ■  3 

where  ^  [  J,  •  $  ^  1 

Just  as  easily  have  chosen  to 
rather  than  the  right. 


•  (4.1-12) 

from  the  solution  of 


(4.1-13) 

.  Note  that  we  could 
shift  g  to  the  left 


Ojr  first  approximation  to  the  desired  filter 
is  to  shift  the  filter  "  f  by  one. 


/  ^  /• 
where  ^ 


(4.1-14) 


Two  types  of  inhomogeniety  have  been  added  to  the 
right  side.  The  first  is  a  weighted  version  of  the  M v~ 
vector.  This  can  be  removed  by  subtracting  the  negative 
of  the  hindsight  operator  (  [  0~  V., , , "  b  ,  J  ). 


(4.1-15) 


where 


;  (V 


(4.1-16) 


89 


Now  we  add  the  prediction  error  operator  to  alter 
to  the  desired  value.  Thus,  let 


(4.1-17 ) 


Then 

"r 


[o.x,..X..]  . %,] 

+  k,.[:*.rk  J. 


Compact  Notation 

The  formulae  corresponding  to  (4.1-17)  is 


(4.1-17a) 


n  -  i 


and  to  (4.1-18)  is 

T  ,  o''  (7)  -  V„  r( <?('"„)}  - 

where  -f,  =  =  0  . 


4.2  Multi-Input  Processes 

The  normal  equations  for  a  multi-input  optimum 
filter  (see  section  3.2)  are  of  the  form 
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where  is  an  n  *  n  matrix 

is  a  ^  *  i  /;*i  matrix 
and  is  a  £*!(<*  matrix. 

The  unit  distance  prediction  error  filter  n 'a  and 
unit  distance  hindsight  error  filter  *  i  are  associated 
with  the  K  matrix: 


where  ,  l, ,  and  arc  n  *  n  matrices, 

-  *la  -  L  >  the  unit  matrix, 
and  and  /?„  are  the  expected  error  matrices 

_  n 

for  a  and  »  respectively. 

The  only  formal  difference  between  this  case  and 
the  single  process  case  is  that  the  R  matrix  is  no 
longer  symmetric .  This  causes  the  hindsight  error  op¬ 
erator  to  differ  from  the  prediction  error  operator 
reversed. 

4.2.1  Recursion  to  larger  Operators 

4. 2. 1.1  Extension  of  the  Prediction  and  Hindsight  Error 
Operators 

Expanded  Notation 

We  proceed  here  as  in  the  single  process  case.  The 
first  approximations  to  the  new,  longer  operators  are 
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made  by  simply  adding  a  zero  to  the  present  operators. 
Then  they  are  added  together  to  remove  the  inhomogeneity 
caused  by  the  zero. 


When  we  add  the  zero,  we  find 


,<?] 


r'  '  r J  a,oi^] 


r-„-,  ••  r. 


rn,,  +•  ••-<-  r, 


£o,,..,xJ 


^  •• 


r. 


(4.2-4) 

o,  /»,] 


(0-5) 


Now  we  weight  ”  b  and  add  to  *  5  and  vice  versa: 


f 


a., 


•  •  i 


<X.n  *  fin 


(4.2-6) 


[/£■'  -it 


K 


K. 


(4.2-7) 


If  we  choose  ^  and 


such  that 


0('n  +  h<  -  O 

and 

(3^  +  A  ,  a(  n  ■  0 


(4.2-8) 

(4.2-9) 
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h*i 

Then  the  new  filters  Cf  and 


are 


""a  -  \_“a. , "o„,  o]  i-  k,  [o.nl>„, ...  ”t.J  (4.2-ao) 

~‘b  *  [OX  kl > . %.,<>] 

and  the  new  a^.,,  and  are  given  by 


&(*+i  '  k» 

(4  ,2-12} 

r  (3  r\  *■  k  t  °(  A, 

(4.2-13) 

Compact  Notation 


The 

compact  forms  for  equations 

(4.2-8)  and 

(4.2-9) 

are 

k. 

■  -  [“<?,  ff/o- 

')}]  a- 

(4  ,2-8a) 

h, 

••  -["t  ,  P"*7 

’  M 

r 

>  ] 

■i 

(4.2-90 

and  for  equations  (4.2-10)  and 

(4.2- 

11)  are 

/ 

<  z  r«;  *  r\ 

z  r 

~i)) 

(4  .2-10a) 

""b 

-  Z  +  /? 

Ef 

(4.2-lla) 
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4. 2. 1,2  Extension  of  the  General  Filter 


Expanded  Notation 


We  can  now  use  the  hindsight  error  operator  to  ex¬ 
tend  the  right  end  of  the  general  filter.  The  scheme  is 
to  add  a  zero  the  the  operator  as  a  first  approximation 
and  then  weight  and  add  the  hindsight  error  operator  to 
remove  the  inhomogeneity.  These  steps  give 


(4.2-12) 


where 


y*  -  X  r_ 


Thus,  if  we  set 


*  h  (.  fin 


(4.2-13) 


n«  / - 

then  +•  is  given  by 


(4.2-14) 


Compact  Notation 


The  compact  forms  of  equations  (4.2-13)  and  (4.2-14) 

are 


K  -  (3..,-  l"f,  A' 

"X  Z  (X)  T  ^  (?X  fc;  (4.2-lita) 
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4.2.2  Recursion  to  Move  the  Output  Origin 

The  desideratum  here  is  the  same  as  in  the  single 
process.  From  the  filter  that  satisfies 

(4.2-15) 

and  the  error 
that  satisfies 

"f'R  - 

where  ^  [  3 . .  $»■,  ]  (1.1-16) 


«  -  2 

whore  j  -■  t  g.  J  ,  _ 

operators  we  wish  to  find  ' 


Proceeding  as  in  the  single  input  process,  we  find 
that  shifting  and  adding  the  filter  and  the  hindsight 
error  operator  gives 

=  [ *m  »  3"  *  '*  3  n"  ] 
(4.2-17) 


where  K,  * 

Now  solve  the  equation 


•+ 


r. 


*■»  ; 


+■  hr  -  Cj  t 


(4.2-18) 


to  find  the  weighting  for  addition  of  the  prediction 
error  operator.  The  new  operator  is 

V  -  [  o  ."4  . ...  .X-.]  - 

-X  [  o, — b . -t.]  *  (^2-W> 

-  [%. 


H-  / 


M-  I 


1 /  » 


-J 
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Compact  Notation 

The  compact  forms  of  equations  (4.2-18)  and  (4.2-19) 

are 


V  -Xp"( RCV),  D"{ R("r))  ])  * 

(4. 2 -17a) 

r  --  PmTO-XF'f/u~v}  ♦ 

n  i  '  tu  o_i«D\ 


4.3  Multi-Dimensional  ProceBBeB 

In  this  section  we  will  consider  only  the  2-dimen¬ 
sional  process  in  order  to  simplify  the  notation.  All 
of  the_techniques  given  are  readily  extendable  to  higher 
dimensionality. 

The  2-dimensional  optimum  filter  is  given  by  the 
equation  (Bee  Bection  3.3). 


3 


I  -  j'i  •  "i 

I  '  j  1  ••  r*'  !  (4.3-1) 


where 


<■  i 


yyi,  and  w,  are  the  dimensions  of  the  filter 
.  We  will  seek  to  extend  the  filter  in  the 
direction  by  one  unit. 


To  accomplish  this  we  will  need  *>iK  prediction  er¬ 
ror  filters  c*'*'1’ 


(4.3-2) 
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0 

aV  .  . 

0 

«';1  .  . 

& 

Jfr-  .**  - 

l! 

_ * _  *«.  * 

^  ^  <  • 

0 

m, .  l  , 

.  . 

where 

■  ffc-.-o  > 

that  satisfy  the 

equations 

|  (  *  •  '  ' 

,  V-^’J  --  c*y,  jk 

I  t  j,  i 

0  »  yx  1  a. 


(4.3-3) 


Here,  as  in  the  single  process,  the  autocorrelation 
array  has  sufficient  (centro-)  symmetry  so  that  if  we 
reverse  the  predictors  to  become 

they  will  still  obey  the  equation. 


4.3.1  Recursion  to  Larger  Operators 


4. 3. 1.1  Extension  of  the  Prediction  Error  Operators 


Following  the  philosophy  of  the  previous  sections, 
we  make  a  first  approximation  <5  *'  *'  |  <  t,  t  to 

0  *  t,  *  M4  v 

the  extended  prediction  error  operators  it  i',  i  », 

o  i  « ,  l  k  + 1 

by  adding  a  column  of  zeros  after  the  column: 


I  ij,  &  ~ i.  ( (4.3-4) 

O'/lJ'-atl 

Since  the  K*'  '*  array  is  centro-symmetric,  reversing 
both  co-ordinates  of  a  has  the  effect  of  reversing 
both  co-ordinates  of  the  right  hand  side: 
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. . fCjJ’ 

i  -  y.  *  . 

0  i  )t  i  **>  ,  4 / 

Thus,  we  can  remove  the  J  terras  by  adding  weighted, 
reversed  predictors  to  the  original  predictors.  That 
is,  if 


gT-  ,  +■  K. .  <p(  :  0 


(4.3-6) 


Then 


4 


•|  <v 


^  *  +  £_j  *v 
/*/ 


Z,  !<u  a . 


I  i  c‘(  i 

Oi  < ,  £»*»,♦•/ 

and  a  new  error  matrix  '  is  given  by 

'  *  i  •/ 


(4.3-7) 


(4.3-8) 


4. 3. 1.2  Oeneral  Filters 


Now  we  will  uBe  the  prediction  error  filters  that 
were  found  in  the  proceeding  section  to  extend  the  length 
of  the  general  filter.  The  general  filter  obeys  the  re¬ 
lation 


[/‘“V"' . ']•-  i‘" 

If  we  extend  the  length  of  {  by  one  in  the 
direction  by  adding  a  column  of  zeros  to  form 


(4.3-9) 

1  L 
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•i 


r  r  ... 

t'-'  ■  ■  ■ 

■  i 

r  c  ■  ■  • 

Then  the  normal  equations  become 


•  /,  Ki  | 


r  ■  o 

C‘~'* 


f  I 


where 


J>  j  i 


I j>  i  -  . 

1  -  J,  t  —  , 


(4.3-10) 


f  *  j.  j  ^ , 

^  1  y/  1  '  T  ^ 


(4.3-11) 


J  j*  ‘  ~>i  *  i 


(4.3-12) 


The  last  column  of  V"  can  be  changed  to  a  given 
column  of  ^  by  adding  weighted  values  of  the  reversed 
prediction  error  operators.  Thus,  If 

-  k;  ••  f4-3-”) 

I  i  j.  i  Hi, 

|  I  ^  V  ( 

then  the  new  ^  Is  given  by 

/  -*  /  +  Z-*  (4.3-14) 

D  6  K* '  I  t.  < .  *  »*,, 

I  i  t  ,  *  v 

The  choice  of  the  end  of  ^  to  extend  was  quite 
arbitrary.  We  could  have  chosen  to  form  a  new  IL 

0  *  M, 

by  adding  the  column  of  zeros  to  the  other  end.  For 
this  case,  we  would  then  use  the  unreversed  prediction 
error  filters  and  a  slightly  altered  form  of  the  <K 
matrix. 


99 


In  the  same  manner,  the  c .  dimension  of  ^  can 
be  extended  if  we  are  given  ^ l  prediction  error  filters 
in  that  direction.  The  development  proceeds  exactly  as 
that  shown  above. 
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THE  WIENER-MASANI  TECHNIQUE  OF  MULTIPLE 
SPECTRAL  FACTORIZATION 


Abstract 

The  prediction  problem  for  single  stationary 
time  series  is  reviewed  and  the  least  square  and  Kolmogoroff 
solutions  given.  Extension  is  then  made  to  the  multiple 
case,  the  least  squares  equations  set  up  and  the  Wiener- 
Masani  factorization  described.  Heuristic  use  is  made  of 
the  Hilbert  space  property  of  time  series.  A  digital 
computer  program  for  performing  the  Wiener-Masani  factori¬ 
zation  is  discussed. 

1.1  Introduction 

The  problem  of  prediction  (extrapolation)  of 
a  stationary  stochastic  process  has  been  under  intensive 
study  for  over  20  years.  Since  the  earliest  work  of  Wiener, 
Wold,  Kolmogoroff  and  others,  a  considerable  literature 
has  developed  in  the  theory  and  techniques  of  single  time 
series,  be  it  to  predict  it  an  arbitrary  distance  into 
the  future,  to  Interpolate  it,  or  solve  one  of  the  mani¬ 
fold  filter  problems.  The  theory  of  single  time  series 
may  be  said  to  be  nearly  complete. 

The  problems  inherent  in  handling  more  than 
one  time  series  simultaneously  are,  however,  far  from 
solved,  and  the  mathematical  apparatus  necessary  is  in 
many  ways  cumbersome.  Since  the  case  of  the  single  time 
series  has  been  so  well  resolved,  one  might  question  the 
necessity  of  treating  the  multiple  case.  The  reason  is, 
of  course,  that  we  are  Interested  in  a  group  of  time 
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scries  where  the  members  are  related  to  each  otner,  i.e., 
correlated . 

An  example  of  ouch  a  group  where  Information 
is  contained  in  the  cross -correlations  is  the  3  compo¬ 
nents  of  motion  of  a  seismograph.  Clearly  the  motibn 
in  a  given  direction  will  bear  some  relation  to  that  in 
a  perpendicular  direction.  The  information  contained  in 
the  cross-correlations  ought  to  be  of  aid  in  predicting 
or  filtering  such  a  process. 

In  section  3.2  of  this  report,  a  method  of 
handling  multiple  prediction  by  least  squares  techniques 
is  given.  An  alternative  method  (due  to  Wiener  and  Masani) 
will  be  discussed  here  from  a  basic  approach.  The  method 
relies  on  very  abstract  geometric  properties  of  time  series. 
The  use  of  this  method  will  help  Illuminate  these  proper¬ 
ties.  In  addition,  since  the  method  is  computationally 
feasible,  we  will  achieve  a  useful  chock  on  the  least 
squares  approach. 

Since  much  of  what  we  will  do  has  a  close  ana¬ 
logue  in  the  well-understood  scalar  series  case,  a  brief 
review  of  the  prediction-factorisation  problem  will  be 
helpful  to  clarify  the  analogies,  and  the  difficulties, 
that  arise  in  higher  dimensions. 

The  discussion  will  be  confined  to  the  dis¬ 
crete  case.  The  approach  will  be  intuitive  and  heuristic 
as  rigorous  documentation  is  amply  available  in  the 
literature . 
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2.  Single  Time  Series 

In  what  has  become  known  as  the  Wiener- 
Kolmogoroff  theory,  we  consider  a  stationary  time 
series  ,  and  for  the  most  part  deal  directly  with 

two  quantities,  the  correlation 

l  (l) 

z--N 

and  the  power  density  spectrum 
.  ®  lu>  T 

$(«*»)=  X  Yre 

T=-<*> 

rr  and  $  M  are  clearly  Fourier  transforms  of 
each  other.  In  the  most  general  case,  the  function  ^ 
will  not  exist  as  such.  Instead,  it  becomes  necessary 
to  use  the  spectral  distribution  function  X  (w)  ,  where 
the  relation  to  the  correlation  is 

the  integral  being  taken  in  a  Riemann-Stieltjes  sense. 

If  A  is  absolutely  continuous  then,  $  exists, 
and 

<£  (u>)  = 

In  circumstances  where  ^  does  not  exist, 
the  theory  Is  more  complex;  as  Wiener  (1950,  p.  58) 
points  out,  however,  in  practice  the  behavior  of  nature 
is  not  such  as  to  give  pure  spikes  in  a  spectrum,  nor  to 
give  pure  Jumps  at  neighboring  points--corresponding  to 
a  non-absolutely  continuous  spectral  distribution.  Later, 
we  will  give  an  explicit  criterion  for  the  existence  of 
an  absolutely  continuous  distribution. 
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Let  the  entire  past  of  a  time  series  be  given, 
iiS-i  at  time  t,  the  set  of  numbers  (  Xt  ,  X^_ ( ,  (  ....  ) 

is  known.  We  wish  to  predict  the  value  of  Xt  at  time  t+<X  . 

At  best  this  can  be  done  with  some  smallest 
error.  There  will  always  be  an  error,  since  by  defini¬ 
tion  no  random  process  can  be  exactly  predicted.  We 
thus  seek  an  Xt  +  #  such  that  the  difference 
is  as  small  as  possible. 


The  Procedure  (Levinson,  1950)  is  to  assume 
that  Xt+<x  is  given  by  some  linear  combination  of  the 
past  of  the  time  series,  l.e. 


■t 


N 


K:  ok 

where  N  is  made  larger  for  a  better  fit.  Note  that  it 
suffices  to  consider  c<  =  1 .  Any  other  distance  can  be 
determined  similarly.  Note,  too,  that  stationarlty  in¬ 
sures  that  will  be  the  same  no  matter  what  choice 
of  origin  is  made.1 

The  square  error  is 

Xt.,-xX 


and  its  mean  over  the  times-T  to 


+  T  is 

N 


=  ''  J.T(  l,a" 


To  minimize,  differentiate  with  respect  to  4*  and  set 
the  derivatives  equal  to  zero. 


By  stationary  is  meant,  loosely,  that  the  underlying 
probabilistic  structure  is  independent  of  time.  Y  S 
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Thus 


t=-T  K- ' 


or 


£"mi  ^  Xt-K^.r^  ).T4i  V  X*  X*-*. 

krs<  -eTTt  _ 

T*  T  u  -T 


(2) 


+T 


BUt  'Moo  MPH  =  rT  '  the 


fi-T 


autocorrelation.  Hence  eqn.  (2)  become: 

N 


£«kV  r  =  t 


(3) 


K'  I 


These  equations  are  discussed  in  section  3»1  of 
this  report. 

Note  that  we  could  have  obtained  the  same  re¬ 
sult  directly  by  requiring  that  the  error  £  t  be  uncorre¬ 
lated  with  the  time  series: 


JjtVr  =  0  =  ^ilTXA  X,.„  y.,.r- ^,1  AA-r 


or 

N 

^  ”  fr 
K»  I 


The  identity  of  the  two  approaches  is  signi¬ 
ficant  and  we  will  return  to  this  point  later. 

In  the  above  prediction,  no  estimate  of  the 
error  involved  was  obtained.  To  find  an  estimate,  a 
deeper  knowledge  of  the  nature  of  time  series  is  necessary. 

The  cornerstone  of  this  knowledge  is  a  theorem 
of  Wold  (Doob,  1953.  p.  576). 


A 

/ 
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Th.  1.  (Wold  Decomposition)  A  stationary 
stochastic  process  X-t  may  be  uniquely  represented  as 
the  sum  of  two  processes,  X*.  =  U  +  +  "IT*. 

CO  X  t  t 

where  U*  =  £  <XK  3t.K 

KsO 

such  that  Q.0^Q.l)  4*^  ...  is  minimum-delay 
where  ^  aK*<oo  }  40  >  0, 

&  &>€>=  *Vs  ^  3nd  E.C3l)  =  0,  (5). 

lTn  is  deterministic  and  E  (UtVf  )-  0* 

By  "deterministic "is  meant,  in  a  loose  sense, 
that  a  knowledge  of  the  zeroes  of  Vt  in  some  finite 
interval  is  sufficient  to  determine  its  zeroes  in  any 
other  interval.  It  is  Just  the  absence  of  such  a  process 
V,  in  Xt  that  will  insure  that  the  spectral  distribu¬ 
tion  function  is  absolutely  continuous . 


For  the  present,  it  will  be  assumed  that 

CO 


Xt  =  1  9f 


*  =  o 


(6) 


The  transient,  or  "wavelet"  has  "minimum  phase 
characteristic"  (Robinson,  1954,  1962)  i.e.  for  all 
other  transients  into  which  we  could  decompose  Xt 

H  JL  ^  ^ 

has  the  property  >_  ^  S"  bK  O^NCoP. 


has  an  additional,  very  important,  property. 

If  an  Inverse  wavelet  is  defined  such  that 

X)  si  =  0,  JL*0 
>  -  |  Ji-0 

KzO  J 

tllen  ^  00  >  1  -e  •  the  inverse  wavelet 

K 
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is  stable  and  eqn.  (6)  can  be  inverted  to  give 

=  %  *  Xt-* 

is  the  only  wavelet  with  this  property. 

The  prediction  problem,  in  the  light  of  the 
Wold  Decomposition,  can  be  given  a  more  explicit  form. 
Suppose  an  estimate  at  time  t  of  is  wanted. 

At  t  =t  +  <*. 

Xft*  ^-troc-K 

t -o 

At  time  t,  the  terms  ^  ^9,-d  *  '  '  •  #<*  9t 

are  unknown.  The  best  prediction  is  thus  based  on  the 


known  terms  and 


S\  Oj 


where  the  missing  terms  are  a  measure  of  the  error. 

The  expected  mean  square  error  is 

e  (>c.-  Xt.«y=  e  (o..«-  •  •  •• +^0J 

But  by  (  4  )  this  is 

^  SO 

and  we  have  an  explicit  expression  for  the  expected  error. 

In  order  to  actually  make  the  prediction,  the 
^  must  be  known.  Hence  a  solution  to  the  inverse 
problem  ^ 

3*  *  S  A.  X.-k 

tf>0 

is  sought.  Since  is  inverse  to  <X  *  ,  a  knowledge 

of  will  suffice. 
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The  autocorrelation  of  is 

-  fS’oo  >-T+l  aK  S.  ^t-K^tft-P 

ft  P  -t 

-  J&v*.  -  * —  ^  o  jT  _  Xou  — - —  S  ^  6? 

-  r->  oa  >T-H  -'f—  —  r-»«o  VT_f  1  P-  T  ^P 


*  P 


P-  T,k 


(7) 


which  is  the  autocorrelation  of  at  alone.  Hence 

$  (<o)  =  JL/t  e  iWT 

and  depends  only  on  aK  .  An  alternative  means  of 
computing  $  (.u)  is  to  take  the  Fourier  transform  of 
a  =  A(k>)  and 

5  (&>)=  AM  A  M 

All  phase  information  about  at  is  lost  in 
computing  A  fa)  A  (u)  ,  and  there  are  an  infinite  number 
of  wavelet  transforms  BH  such  that  6  (w)  G*(w)  r.  A(«>)  A  M 
In  order  to  find  A  6")  from  the  spectrum  (whose  transform 
is  the  desired  wavelet)  use  must  be  made  of  the  unique 
property  of  ^  ,  --  it  i3  the  only  Invertible  wavelet. 

Theorem  2.  (Jensen's  Formula,  Ahlfors,  1953, 

P.  135) 

Consider  an  analytic  function  f Cl)  and  suppose 
f(*)v^O,  HI  4  |  ^  Then  in  the  region  1 2| 4  |  ,  J-f(i)| 

satisfies  Laplace's  equation  and  by  the  standard  theorems 


of  complex  variables 


lex  variables 


(8) 


(Cauchy  integral  formula).  If  there  are  zeroes  of  f(z) 
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inside  I'il  —  I  ,  then  equation  (  8  )  must  be  modified. 

Suppose  the  zeroes  are  located  at  ^  ( l  x  •  •  -j  h  ,  where  n 
Includes  multiplicities.  Denote  their  complex  conju¬ 
gates  by  ^  .  The  quantity 

has  poles  at  the  zeroes  of  f  (?)  and  Q(e“)=  I  • 
Fd)--awfw  is  analytic  for  —  I 

and  by  (  8  )  x)r 

=  I  -f  (O)  I  +  sLr-jj'  (X  W  ^ 

=  Vflo)+i  ^TTf7\  ^-kr{'h-\f(e-i')U6 


Hence, 


j)t 

^  \f(°A  =  ~1'£4TJ7l\  'frrf/^lf^e'<>)l &9 


(9) 


This  is  sometimes  also  expressed  as  an  inequality, 

-  xw{'£T^  <*e  (10) 

where  equality  holds  if  If.Ul  (the  condition  is  suf¬ 
ficient,  but  not  necessary). 

The  representation  is  unique  is  the  sense  that 
no  other  function  &•(?)  has  the  property  that 

9-(f>0  -*  f  (eL°) 

f-t  i  Q 

Suppose  ([>  is  factored  in  some  arbitrary  way 

$(e;<>)  =  FCe^FV1'*) 
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By  Jensen's  formula ,  3.jr 

Xejf- 1  F(°)l  *  —  Vr  lp  I*" 

if  F te)  has  no  zeroes  jjjjt:  |  ,  then  strict  equality  holds 
and  , 

IFWI^e*-  f^lKe^Ue 

'  0  0  (11) 

The  absence  of  zeroes  of  F (*)  implies  that 
F(l)  “  S  (0  is  analytic  1*1*1  and  the  Fourier  transform 
o£  exists.  By  the  convolution  theorem  of 

Fourier  transforms,  ft  is  inverse  to  the  Fourier 
transform  ft  of  F: 

%  ft-K 

Hence  i t  is  invertible  and  must  be  just  the 
wavelet  of  the  Wold  Decomposition.  Therefore,  we  have 
proved  the  following  theorem: 

Theorem  3  • 

/4*(eic)  ,  then 

(a)  |A(®)|>'^ 

(b)  Equality  holds  if  and  only  if  A  Cv)ia 
the  minimum  phase  wavelet  of  the  Wold  Decomposition. 


We  now  have  sufficient  information  about  A (w) 
to  generate  it. 

From  the  uniqueness  part  of  Jensen's  formula, 

A  (i)  is  analytic  |?l-  I  ,  and  |  A  (o>|*-=  e*^5^**' 
such  that  lim  )A(f  |  (e^) 


Ill 


Therefore,  (Robinson,  1954,  p.  125)  let  us 
seek  a  power  series  expansion  of  A(*)*  such  that 

t 

and  a  Fourier  series  expansion  of  log  £  (ete) 

$  (e1,0)-  @r\ 

Hence  for  1^^  I  , 


=  2. 

’*  A 


./n 


^n:o 


or 


A(*)f=  e* 


^  .  « 
£  A*t 


(12) 


Thus,  the  procedure  is  to  expand  log  ^  (e*'®)  in 
a  cosine  series  and  3olve  for  the  power  series  co¬ 
efficients  of|A(*)|  in  (  12  ),  and  1(r 

|M.)|S  e*  =  e*.^*&>* 

This  is  called  the  Kolmogoroff  factorization  procedure. 


3.  Multiple  Time  Series 

3.1  Preliminary  Considerations 

Notation:  The  "order"  of  a  time  series  will  refer 
to  the  number  of  single  time  series  which  make  it  up. 

Thus,  the  single  time  s es  of  the  previous  sections  have 
order  1  (scalar  case). 

The  "dimension  ’  of  the  time  series  will  refer 
to  its  physical  display .  Thus,  a  single  time  series  has 
dimension  1.  Four  3ingle  time  series  displayed  as  a 
square  array  have  order  4  and  dimension  2,  etc. 

Quantities  which  are  matrices  or  vectors,  and 
which  may  be  confused  with  corresponding  scalar  quanti¬ 
ties  are  barred:  X*  is  one  time  series;  Xt  is  of  dimen¬ 
sion  2  or  more,  etc.  * 


Outwardly  many  of  the  techniques  for  handling 
multiple  time  Beries  are  the  same  as  in  the  scalar  case. 
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We  will  make  use  of  this  similarity  as  far  as  possible  in 
what  follows.  There  is  as  yet  no  complete  solution  to 
the  problem  of  multiple  prediction,  and  there  are  at 
least  two  related,  but  .different,  approaches  to  the  prob¬ 
lem  (Wiener  and  Masani,  1957,  1958;  Helson  and  Lowden- 
s lager,  1958). 

This  paper  will  discuss  the  Wiener -ffesani 
technique,  as  it  lends  itself  to  an  intuitive  approach. 

The  two  papers  of  Wiener  and  Masani  are  clear  and  elegant; 
at  the  risk  of  oversimplification,  we  will,  therefore, 
restrict  this  paper  to  a  purely  heuristic  treatment. 

Restriction  will  be  made  to  2-dimenslonal  pror 
cesses  of  order  n,  1 .e .  n-scalar  time  series  written  as 
a  vector  X*  »  where  the  ith  scalar  series  is  denoted 
X*  •  Thl3  is  no  restriction  on  the  mathe¬ 

matics  and  avoids  the  cumbersome  notation  necessary  in 
handling  arrays  of  time  series. 

Definition:  A  function  F  (®)  will  be  said  to  be 
YiXYri  matrix  valued  if  F(0)  is  given  by  anTixVri  matrix. 

Q _ 

Definition:  ByGC&)  =  J  will  be  meant 

the  7j  xhn  matrix  whose  elements  are  the  integrals  of  the 
corresponding  elements  of  F  (0)  . 

Consider  a  set  of  matrix  valued  functions  ^  F  ] 

A 

[  |F(T>I^T  *  00 

O 


such  that 
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Then  the  set  ^  F  }  Is  said  to  be  in  over^^>A"3  . 

Suppose  P>  G  c  L  i  .  Then  define  the  "inner 
product"  of  F a^iJL  G  as 

\j  (y)  £*(?)]  &  f  (13) 

o 

which  will  be  denoted  ((F,  G)). 

Define  the  "norm"  of  a  function  FeL,  as 
the  square  root  of  its  inner  product  with  itself,  written 


It p H  =  ~)  -  [* [Vct> 


(14) 


o 

Notice  that  the  inner  product  can  be  interpreted  as  a 

dot  product--it  gives  a  measure  of  the  "overlap"  of  pCLrrufi-  Q 

over  their  region  of  def inition--the  interval  L 0,  A]  . 

The  norm  of  P  is  interpretable  as  the  "length" 
of  F.  It  has  all  the  properties  normally  associated 
with  a  length--!. e. 


||? II  >  o  ,  F  *  0  (a) 

II  a. F ||  =  M  liril  Lir)  (15) 


The  "distance"  between  2  functions  F"  is 

||  p  -  £  ||  >  0  ,  F  ^  "5 


(15a) 


The  triangle  inequality  exists: 

||  F -  3  H  L  II F- Till  +  II H- 5  H 


(15b) 


Using  the  definition  of  inner  product,  we  can  speak  of 
2  functions  as  being  orthogonal  if  and  only  if  (£  p  G))  —  0  . 
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If  the  functions  |  F,  j  obey  these  rules,  and 
if  every  Cauchy  sequence  in  converges,  then  the 

space  Lj.  is  called  a  Hilbert  space  (Halmos,  1951,  p.  17). 

Notice  that  in  such  a  Hilbert  space,  given  a 
sequence  of  non-zero  functions  Ft'  an  orthonormal  sequence 
can  be  generated,  i.e.  if 


II  Pi  It  =  k 

then  Qi-  Tjr  rt‘  will  have  "length"  1  and  the  Gram- 

Schmldt  process  with  the  definition  (13)  can  be  used 

to  construct  the  next  function. 

Suppose  two  time  series  X  y  and  of  order 

l  : 

n  are  taken  as  71  X  I  matrix  valued  functions  where  X,.  0*^ 
are  each  in  .  Define  the  matrix,1 

H  =  (  1  (f'*  ) 


where 


(16) 


This  term  is  Just  the  cross-correlation  between 
X  and  .  The  matrix  H  is  called  the  Gramian  of 

X  and  ^  .  In  time  series  analysis,  it  is  a  cross¬ 
correlation  matrix,  where  H  r  H  *  •  If,  as  here  X 

and  are  stationary  processes,  H  depends  only  on 

V  -  £  ,  not  on  f  and  S  Individually. 

The  inner  product  (  13  )  then  becomes 

((  V,  fc))  '  Hr-s 

^The  integral  should  really  be  defined  in  terms  of  an  under¬ 
lying  probability  space,  but  the  effective  result  is  the 
same  as  eqn .  (16) . 


115 


In  the  scalar  theory  of  time  series,  a  funda¬ 
mental  role  was  played  by  the  correlation.  Suppose  we 
attempt  to  use  the  Gramian  matrices  in  an  analogous  way. 
Instead  of  taking  the  inner  product  of  X,  ^  to  be  tn*.  H j 
let  the  inner  product  be  given  by  H  itself. 

Then  will  be  said  to  be  orthogonal  to  ^ 

if,  and  only  if,  Hh_s  =  0  ,  and  that  Xh  ^as  unit 

length  if,  and  only  if,  (Xj-  X*.*)  -  X 

This  new  space,  to  be  denoted  X  ,  is  no 
longer  a  Hilbert  space.  It  differs  from  a  Hilbert  space 
in  a  very  profound  fashion.  It  no  longer  satisfies 
(  15  )  or  (  lpa  ).  Moreover,  consider  the  generation  of 
an  orthonormal  sequence  of  functions  f*t. 

Suppose  (  ,  f  i)  =  K  .  Then  presumably  the 

quantity  "f[  -  ^  will  have  unit  length.  But  ft"  is 

a  matrix--it  may  not  be  invertible,  and,  if  it  is  not, 
a  normalized  sequence  cannot  be  generated,  though  an 
orthogonal  sequence  may  be,  of  the  form  such  that 

The  underlying  distinction  between  the  spaces 
and  ,  is  that  Lx  is  a  Hilbert  space,  whereas 

is  the  Cartesian  product  of  a  set  of  Hilbert  spaces. 
Each  K  individually  satisfies  the  requirements  of  a 
function  in  Lx  .  In  creating  X  x  we  have  adjoined  the 
xj  in  groups  of  n.  This  is  the  relationship  and  the 


^For  the  definition  of  square  root  of  a  matrix  see 
Appendix  A . 
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distinction . 

The  difficulty  with  singular  matrices  Is  one 
of  the  problems  of  the  multiple  theory. 

Despite  this  difficulty.  It  is  still  convenient 
to  think  of  X.  as  defining  a  linear  vector  space.  The 
visualization  of  a  time  series  as  a  vector  in  a  linear 
vector  space  is  a  very  useful  concept. 

The  past  of  a  time  series  can  be  thought 

of  as  "spanning"  a  subopace  of  X,  . 

Let  My,  denote  the  subspace  spanned  by  An 
and  its  past.  At  a  later  time  t,  the  series  and  its  past 
spans  a  new  subspace  where  ,  and  we  are 

able  to  speak  of  the  difference  space  M t -  M*  .  This 
concept  can  be  given  a  rigorous  formulation. 

In  this  same  sense  one  could  speak  of  unit  vec¬ 
tors  {*0  In  ,  much  as  we  use  unit  vectors  in  ordi¬ 
nary  Euclidean  3-space  or  use  eigenvalues  as  "unit  vectors". 

If  a  complete  set  {^t.^  coultl  be  found,  any  time 
series  X*  could  be  expanded  as 

v  ^  oo 

At  —  ^ 

However,  as  we  have  pointed  out  above,  it  may  be  that 
normalization  Is  Impossible.  This  simply  implies  that 
though  we  may  3eek  a  complete  set,  we  may  not  be  able  to 
normalize  It. 

In  summary,  for  conceptual  purposes,  a  time 
series  will  be  regarded  as  defining  a  linear  vector  space, 
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,  with  inner  product  given  by  the  Gramian,  with  xi  £ 
if,  and  only  if,  (  X(  -  0  .  As  time  increases,  a  series 
spans  a  larger  and  larger  region  of  ,  i.e.  a  new 

"basis  vector"  is  added  with  each  time  interval.  , 

of  course,  has  the  property  that  the  number  of  its  dimen¬ 
sions  is  infinite  and  the  even  more  unusual  property 
that  non-zero  vectors  may  not  be  normalizeable .  These 
two  properties  are  a  reminder  that  the  picture  of  a  time 
series  as  a  vector  is  purely  for  purposes  of  intuitive 
argument,  and  has  only  a  very  rough  truth. 

Consider  the  prediction  problem  once  again. 

Given  Xt  its  entire  past,  and  its  Gramian,  we  wish  to 
predict  in  the  best  possible  way  Xr+0(.  .  In  terms  of 

the  vector  space  picture,  what  we  have  in  Xt  and  its 
past  is  a  portion,  of  .  in  order  to  know  XtY<^  , 
we  would  have  to  know  .  The  question  is  how  to 

best  represent  with  a  knowledge  of  Mt  only, 

lacking 

In  ordinary  Euclidean  3~space,  we  could  think, 
in  analogy,  of  the  problem  of  how  to  best  represent  a  3- 
dimenslonal  vector  with  only  2  unit  vectors,  i.e. 
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The  best  representation  or  nv~  in  terms  of 
and  ex  will  necessarily  leave  as  error  a  vector  that 
is  perpendicular  to  both  e,  ,  and  e*.  .  To  find  the 
best  representation  we  seek  constants  a  and  b  such  that 
'ir  -3.6,  b  and  nr-  ~~  V  is  perpendicular  to  both 

Jk  V 

ar>d  .  Tills  gives 

ns-  1=  (nr  •  e  |  ■+-  (  or  •  . 

To  represent  in  terms  of  its  past,  we 

seek  matrices  such  that  +  * =  *£cL^  X 

n-o 

r\  __ 

where  X*.*,*  Xfct^iS  to  be  perpendicular  to  Xe  t=0-'T< ^.(17) 
The  effect  of  using  matrix  coefficients  is  to-  make  use  of 
the  past  of  all  the  time  series  to  represent  any  one  of 
them--thus  increasing  the  number  of  basis  vectors. 

U3lng  (  17  )  as  a  condition  and  the  inner  pro¬ 
duct,  (  13  )  v/e  have 

X-n*.  t  ~  D  f  •  • '  )  CO 

)  X‘t  -0  W  =  0 
or 

(18) 

Eqn.  (15  )  will  be  recognised  as  a  multiple 
version  of  eqn .  (3)  for  the  scalar  case.  It  was  pointed 
out  above  that  the  requirement  that  the  prediction  error 
be  uncorrelated  with  the  time  series  is  equivalent  to 
minimizing  the  mean  square  error.  This  can  also  be 
shown  to  be  the  case  here  (see  Section  3.2  of  this  report). 
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In  practice,  of  course,  the  entire  past  of  the  series  is 
not  known  and  I '=  •  ■  •>  N  ,  some  N  . 


3.2  Wlener-Masani  Technique 

In  considering  a  scalar  time  series,  we  sought 
another  formulation  of  the  prediction  problem— in  terms 
of  the  Wold  Decomposition  and  the  Kolmogoroff  factori¬ 
zation  technique  to  gain  insight  into  the  nature  of  the 
time  series  process  and  to  determine  the  error  involved. 

A  solution  completely  analogous  to  the  Kolmo¬ 
goroff  spectral  factorization  is  not  known.  The  Wlener- 
Masani  technique,  however,  provides  a  similar  exact 
factorization — though  the  method  of  solution  is  quite 
distinct.  It  is  more  complex  than  the  Kolmogoroff  solu¬ 
tion  due  to  the  matrix  nature  of  the  quantities  involved. 

A  first  step  in  this  technique  is  the  determination  of  a 
generalized  version  of  the  Wold  Decomposition. 

Consider  Xn  and  its  past  Mn_,  .  The  pro¬ 
jection  of  X*  on  its  past  we  will  denote  as  (xJfV,)  . 
where  the  projection  can  be  Interpreted  as  a  vector 
projectior.—onto  a  plane,  for  example. 

If  Xn  "  (Xn  |  ^  0  ,  we  will  say,  following 

Wiener  and  Masani,  that  the  time  series  is  non-deterministic 
and  define  =  •  (19) 
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For  obvious  reasons,  5 T|  Is  called  the  "innovation"  pro¬ 
cess  of  X* 

The  Gramian  of  Is 


The  vector  Interpretation  of  this  is  that  the  innovation 
must  be  orthogonal  to  each  other — by  definition. 

The  "rank"  of  the  process  is  the  rank  of  G.  If 
C  exists,  then  the  process  is  said  to  be  of  full  rank 
and  we  can  normalize  the  .  Here  is  a  demonstration 

of  the  critical  importance  of  the  matrix  nature  of  the 


inner  product  of  .  The  prediction  problem  for  pro¬ 

cesses  where  G  is  of  less  than  full  rank  has  not  been 
solved  (Helson  &  Lowdenslager,  1961,  discuss  this  problem) . 

The  Important  point  is  that  for  full  rank  pro¬ 
cesses  a  Wold  Decomposition  exists.  A  decomposition  for 
processes  of  less  than  full  rank  is  not  known .  Therefore, 
in  the  remainder  of  this  paper,  unless  otherwise  stated, 

G  will  be  assumed  to  have  full  rank. 

Theorem  (^).  (l/old  Decomposition.  See 
Wiener  and  Masani,  1958*  p.  137  for  a  proof). 

If  is  the  innovation  process  of  a  non- 

determinlstlc  time  series  of  order  q,  then 

(a)  Xn  -  Un  +  'U*  such  that 

K  -O  ' 
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and  II  U>,  ||  =  X  ((^K  ^  °° 

n~o 

where  G  -  (^0  (  .  The  matrices  4*  are  the  Fourier 

coefficients  of  U-^  with  respect  to  the  d^  ,  the 

are  minimum-delay,  and  "lr^  is  deterministic. 

Note  that  this  is  a  complete  matrix  analogue  to 
theorem  1.  As  in  theorem  1,  it  is  required  that  there  be 

cP  _  (  j.  .  x 

an  inverse  matrix  wavelet  d-^  ,  such  that 

n-o 

The  matrices  are  determined  by 

r  Q  =  (Uo,?^  by  statlonarlty- 
or  Q  =  (U0  1  3-k)  -  (X0 )  jL*)  • 


For  a  time  series  of  order  2,  a  picture  of  the 
structure  of  such  a  process  is  given  by  Robinson  (1962, 
P.  9*0. 


Since  G  is  Invertible,  we  could  normalize  ^ 
to  a  sequence  "f  h  ,  suppressing  all  mention  of  the  G 
matrix.  Since  it  is  sometimes  convenient  not  to  norma¬ 
lize,  we  will  use  the  sequence. 

Define  the  past  of  the  "irK  process  by  Tj  * 
Then,  in  the  vector  space  analogy,  the  meaning  of 
is  that  lim^  7\n  *  {0}  and  all  k.  Thus,  the 

"remote  past"  of  is  some  finite  volume  in  and 

does  not  change  in  time. 


If  a  process  has  *  -  {0}  ,  then  it  is  said 

to  be  "regular."  All  processes  of  the  form  of  tf,.  are 
regular.  Such  processes  will  have  absolutely  continuous 
spectral  distribution  matrices.  We  will  discuss  only 

these  processes;  U.,  from  now  on  XK  is  assumed  to  be 
of  the  form 


Xt  =  1 

T  n-o 


(20) 


The  prediction  problem  in  these  terms  is  the 
problem  of  finding  the^best  representation  of 


where  we  do  not  know  the  terms  4 
If,  however,  we  can  find  the  J 
we  can  make  a  best  prediction. 


a., 

,  and  the  wavelet 


The  Oramian  of  (  20  )  is 
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\ 


oo  oO 

^  -TVS  ^ 

K-0  S-0  ' 


^E,  as  ^St> 


S'  o 


G 


depending  only  upon  G  and  <2.^  ,  and  not  explicitly 
on  the  7t  .  Thus,  the  power  density  matrix  $  (u>) 
depends  only  upon  (Zn  and  G.  $  (oj)  could  also  be 
computed  directly  from  the  Fourier  transform  of  the 
wavelet  Q>n  ,  were  it  known,  by  ^  (to)  r  A  (<*>)  6A  Ci-)  (21) 
Since  there  is  no  phase  information  in  (  21  )  about  A(u), 
there  are  an  infinite  number  of  wavelets  which  will  give 
the  same  power  density  matrix.  However,  (XK  is  invertible. 

oo  _  x  ^ 

Thus,  its  transform  AG-’)  *  0^  G  C  ,  must 

k-  -P  _ 

have  an  inverse  transform 

D<*o)  '  1.  C  ^  ^ 

n:-r 

Apply  Jensen's  theorem  (  2  )  to  det  A  C1^)  G  A  * 
If  A(w)is  a  well-enough  behaved  function,  the  theorem 
will  be  applicable.  Then 

jk*[AMGA*<°)]  6  55.  J^y.  J^[A(elf)GA*(e^)]j!0 

0 

G.  is  invertible,  hence  fat  G  *  0  .  If  1*1*1; 

we  have  strict  equality  and 

faX  [A(o)GA^ Co)] 

o 
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or 

At  [A(o)GA*(o)]=  $  (e£V© 

O 

This  application  of  Jensen's  formula  suggests 
_  attempting  a  Kolmogoroff  factorization  of  $  (u>)  ,  ln 

the  same  manner  as  was  done  for  the  scalar  case  ln  Part 
II.  A  complete  analogy  would  be  to  expand  AX(Z)  in  a 
matriclal  power  series 

B0  H-  B,  "2  4-  B*  .... 

where  the  bK  are  matrices.  Unfortunately,  equation 

(  22  )  is  a  scalar  relationshlp-it  relates  the  determinant 

of  the  wavelet  with  the  exponential  of  the  integral  of  the 

determinant  of  $  M  .  There  is  no  simple  relationship 

between  a  determinant  and  the  elements  of  the  matrix 

Involved.  Nor  is  there  any  hope  that  the  elements  of  A 

cannot  Individually  vanish  for  lit ^ |  .  Furthermore,  as 

Wiener  and  Masani  point  out,  even  if  we  were  led  to 

consider  an  expansion  in  matrices  of  the  form 

^D0*d,  *  +da  - 

there  lo  a  difficult  problem  of  uniqueness,  since 

r-  —  6.  G  C  S  , 

and  commutativity  has 

broken  down  (  eA  would  be  defined  by  Sylvester's 
formula,  appendix  A).  __ 

Though  it  is  conceivable  that  these  difficulties 
may  be  overcome  by  proper  definition,  we  are  instead  led  to 
consider  an  entirely  new  approach. 

Since  CLK  =  •  a  knowledge  of  K  iS 


1 

\ 


i 
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sufficient  to  get  a  best  fit  to  the  prediction 

pO 

IE  ^IX  ^t-hCX-tx  (23) 

tx-o. 

Therefore,  we  consider  the  inverse  problem 

3-*  =  tA  Xt-^  (2k) 

A  solution  to  (  2k  )  determines  the  ^  and  the  CL^ 
from  _  oo  gi  _ 

G  =  (^0  =  ,(25) 

By  stationarity,  it  would  suffice  to  find  J  ,  for 
(  25  )  becomes  5(X 

The  key  to  the  method  is  a  Hilbert  space  theorem 
and  corollary  of  Von  Neumann  (1950,  p.  55"56)  which  we  will 
assume  applies  to  and  interpret  in  terms  of  a  vector 

space . 

Theorem  (  5  ) 

Let  EL  and  be  projection  operators  onto  the  subspaces 
M,  and  Mi.  ,  respectively,  of  .  Let  7]  be  the 

portion  of  orthogonal  lo  n,  i-  and  let  G{  be  the 

projection  onto  7|  .  Then 

q  =  I-E  -  F-  E  (,-F)—  F(-E)-  E(-F(-E))-F  (-Cf— F)) , .  ■ 
-I-E  — F  +  EF-t-FE  -  EFE  -  FEF  •  •  .  . 

(This  is  hardly  a  rigorous  statement  of  the  theorem,  but 
it  conveys  the  meaning.) 

The  projection  operators  are  Just  the  Hilbert 
space  version  of  ordinary  vector  space  projections--! ,e .  dot 
products.  The  theorem,  if  interpreted  in  vector  space  says 
the  following: 


126 


Consider  a  vector  'ir  •  t  v  )  and  two  other  vec- 

tors  JaCf(0^)  o)  and  f°=  J3i  o)(  ~f  not  necessarily  ortho¬ 
gonal.  to  p  ) . 

Let  the  projection  of  a  vector  Into  J  be  given 
by  E  and  the  projection  onto  3  by  F 

Then  Ev.-(r-J  and  ^  "r  *  C~V't*  )(°  . 
Repeating,  F  E  ir  = 

etc.  This  is  pictured  In  Figure  2. 


Ft  v- 


(Pig.  2) 


Clearly  the  tenns  EFEPt  •••  are  growing  smaller  and  smaller 
as  projections  are  made,  back  and  forth.  In  the  limit, 

Q.-\r*'ir-E.'J-_FE  ^  +  EFE  a  .... 

is  Just  the  component  of  V  normal  to  both  J  and  f 
Note  that  in  this  space  EF-FE  ,  and  so  we  need  consider 
only  one  or  the  other.  In  a  more  general  3pace,  the  pro¬ 
jections  do  not  commute,  and  both  terms  are  necessary. 

If  a  third  vector,  X  not  in  the  plane  of./ 
and  /°  ,  and  the  projection  onto  it  had  been  considered, 

the  sequence  would  have  gone  to  zero--a  reflection  of  the 
3  dimensions  of  this  3pace.  In  a  space  of  higher  dimensions, 
a  3rd  operator,  or  even  more,  is  permissible. 


By  analogy,  the  projection  operators  in  ^ 
must  by  the  inner  products--the  Gramians— and  the  vectors 
tf  and  (°  must  correspond  to  the  component  one-dimensional 
time  series.  If  X*.  has  order  3>  a  sequence  of  the  form 


t  *  9 


j-£-F-G  +  EF  +  FE  +  GF+G&-EFE - -  (26) 

must  be  considered,  and  similarly,  for  a  series  of  order  n. 

In  applying  the  Von  Neumann  theorem  explicitly 
to  the  prediction  problem,  restriction  will  be  made  to  a 
process  of  order  2--slnce  from  (  26  )  the  projection 
sequence  for  higher  orders  is  very  complex. 

Let  *t"  (  k)  and  let  the  projection  on 

^  J  ^  V 

and  the  projection  on  X^r  .  Then  P;  =  >  Xp  )  Xp 

and  the  operator  applied  as  P^ 

fix’  =  X,  )x;  and  PX-  (*'«  ,  x;)x? 

(analogous  to  (flnf)  f  ).  Lot  (Xj  ,  ^<t)  -  dpc(  (2? ) 

(r:ot  a  restriction.  See  Part  IV  below),  and  let  M_t 
and  K*  denote  the  pasts  of  X*t  and  Xt  respectively. 
Then  the  portion  of  Xt  }  t  ‘ —  oo  normal  to 

must  be 

Thus,  the  Innovation  is  what  will  be 

determined  if  we  can  find  +  •  ^t 

(xt  ,  Xp)  =  flt-p  j  (xt  ,  XP)  -  t>t.p 

then  under  the  condition  (  27  ),  the  projections  commute, 


7  =  X. 

0  Tn 


and 
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Factoring  "X_p  out  of  each  term,  this-  can  be 
written  in  matrix  form  as 


1  0 

K-^j 

0  4n, 

+  1 

< 

-1 

0  <»•»  k-A 

0  J_ 

o 

o  <U 

which  is  seen  to  be  precisely 


(28) 


of  eqn.  (  2*4  ) . 

Let 


*> 


.  Then  (  28  )  becomes 


1  -  lViE^^E_-ZlE,En.,£_  +  .. 


(29) 


»n-*i  n  p 


Since  J0  is  now  explicitly  determined,  the  prediction 
problem  is  solved.  The  expectation  of  the  prediction 
error  i3, as  in  the  scalar  case, 


E  KS--  %  A 


n 


Computational  Procedure 

The  Wiener-Masani  technique  has  been  programmed 
in  Fortran  for  the  IBM  7090  computer  at  M.I.T.,  making 
as  great  use  as  possible  of  existing  programs. 

There  are  two  main  difficulties  that  present 
themselves  in  consideration  of  a  computational  procedure. 


->  * 

(28) 
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The  first  involves  the  question  of  convergence  of  the 
sequence  in  eqn.  (  29  )•  If  the  "vectors  are  parallel, 
there  will  be  no  convergence;  if  they  are  nearly  so, 
convergence  may  be  very  slow. 

The  second  difficulty  occurs  in  connection  with 
the  condition  eqn.  (27  ).  To  show  how  this  is  achieved 


in  practice,  it  is  necessary  to  consider  the  computing 
procedure.  In  actually  solving  the  problem,  the  method 


used  is  a  slight  variation  of  the  one  outlined. 

If  the  power  density  matrix  of 
and  if  ?  =  2.  ,  then  it  can  be  shown 


(see  Appendix  B)  that  the  power  density  spectrum  of  <=7t 
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Henc 


The  <xK  are  given  by  the  matrices  of  eqn.  (  29  ). 

e  ~  Ek-p’I-'”'  an^  . 


In  order  to  insure  that  condition  (  27  )  holds, 
it  is  necessary  to  prefactor  the  diagonal  elements  of  $ 
That  is  to  say,  if  5 ((  and  are  the  autopower 

spectra  of  X  t  and  Xt  >  then  we  find  their  minimum 
phase  factors  such  that 

?„M  =  (u») 

i„/<4  =  p,h  f,  h 

The  spectral  matrix  is  then 


The  problem  Is  worked  with  $  (to)  .  At  the  end,  the 
answer  Is  remultlplled  by  7]  and  T]*  .  This  prelim- 
inary  factorization  is  done  by  either  a  least  squares 
technique  {routine  WLLSFP)  or  the  Kolmogoroff  technique 
(FACTOR). 

There  is  some  ambiguity  about  the  definition  of 
the  wavelet.  If  one  defines  A  =  AK  C  KL&  ,  then 

of  course,  3?  =•  A  A  ,  a  somewhat  neater  factorization. 
But  with  this  definition,  A  =4'"  G  ,  an  explicit  know¬ 
ledge  of  G  is  required  (which  can  be  obtained  from  G  - 
T  ^  T  )  --  an  extra  burden  on  the  method. 
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If  a  solution  is  attempted  in  the  time  domain 
alone,  one  also  finds  a  knowledge  of  G  necessary,  for  we 
have  •  Then  (XK  ^ 0)  -  A*  0 


ivs  =  ]>  A*  (X* ,  X.n) 


and  the  wavelet  is  defined  only  up  to  the  missing  factor 

or  G*  . 

In  practice,  of  course,  one  might  wish  the 
wavelet  inverse  to  A  and  the  definition  A-  IAk? 

n 

introduces  an  asyrmetry  Into  the  relation  between  A  and 
its  Inverse. 

If  we  do  choose  to  normalize  to  fPl  (tP,f,KL, 

then  we  can  seek  a  factorization  where 

Ch  =  (XK  ^  •  ar,d  ‘‘;^en  the  relation  between  the 

C  ^  and  the  c$-K  is  completely  symmetric. 

As  a  convention,  it  is  perhaps  best  to  require 
an  extended  Kolmogoroff  normalization  Ao  “  .  Then 

G  and  its  square  roots  (see  Appendix  A)  must  be  positive. 

Then  there  is  no  ambiguity  about  the  factorization. 

T’he  matrix  nature  of  the  process  introduces 
certain  complications  into  the  computations.  A  flow  chart 
of  the  complete  procedure  is  appended  (fig.  3).  The  process 
is  still  not  fully  debugged,  but  the  computations  carried 
out  thus  far  Indicate  that  the  method  is  a  valid  one. 

The  procedure  outlined  in  the  flow  chart  is 


carried  out  by  the  subroutine  WIMAF1.  Inputs  to  the 
subroutine  are  two  time  series,  xl  and  x2,  their  lengths, 
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the  lengths  of  the  internal  spectral  computations,  and 
two  cut-off  parameters  for  the  Von  Neumann  projection 
process — a  maximum  allowable  error  in  each  term,  and 
a  maximum  number  of  tries  permitted  in  attempting  to  reach 
that  error. 

Subroutine  MACOPS  computes  the  auto-  and  cross¬ 
correlations  of  the  two-time  series,  (subroutine  QXCORR), 
weights  them  with  a  Daniell  spectral  window  (ADANL) ,  and 
sets  up  the  appropriate  power  density  matrices  (ASPECT, 

xspkct)  . 

WIMAF1  then  calls  subroutine  FACTOR,  which 
performs  the  scalar  Kolmogoroff  factorization  of  the 
diagonal  members  of  the  spectral  matrices.  The  two 
resulting  wavelets  are  then  taken  back  into  the  frequency 
domain,  by  COS  ISP.  Subroutine  PDDIV  performs  the  division 
on  the  spectral  matrix,  outlined  above,  that  whitens  the 
diagonal  elements. 

W I  WiFi  then  drops  the  constant  diagonal  terms, 
ar.d  calls  COSISP  once  more  taking  the  now  modified 
spectral  matrix  back  into  the  time  domain.  The  result 
is  the  correlation  matrices  in  the  proper  form  for  use 
in  the  Von  Neumann  projection  process.  The  Von  Neumann 
projections  are  formed  by  VONEPS  (Fig.  *4)  to  within  the 
allowable  error  set  by  the  input,  or  until  the  maximum 
number  of  projections  permitted  is  exceeded.  The  result 
is  the  time  domain  Inverse  of  the  variable  ^  .  This 
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is  again  Fourier  transformed  by  CQSISP,  and  then 
inverted  point-wise  in  a  loop  by  subroutine  COMAIN. 

The  process  is  them  completed  by  multiplying 
y  by  the  wavelets  previously  taken  out  by  FACTOR. 
This  is  performed  by  COMA ML.  The  result  is  then  the 
Wold  Wavelet. 

A  program,  MAR OCT,  has  also  been  written  for 
taking  the  square  root  of  a  matrix,  if  the  more  complete 
factorization  in  terms  of  0  is  wanted.  This  is  not, 
however,  currently  part  of  WIMAFl. 

The  procedure  is  immediately  expandable  to 
orders  greater  than  two  with  a  modification  of  MACOPS, 
the  setup  routine.  The  key  Von  Neumann  projection 
subroutine  VONEPS,  will  accept  a  process  of  any  order. 
The  number  of  computations  to  be  performed,  of  course, 
goes  up  very  rapidly  with  increasing  order. 

Since  certain  processes  are  best  performed 
with  the  variables  in  matrix  form  (the  projections,  for 
example),  and  some  must  be  done  with  the  variables  in 
linear  form  (the  Fourier  transforms),  there  are  a  number 
of  conversions  into  and  out  of  matrix  form.  This  process 
is  rendered  considerably  earier  by  a  matrix  transpose 
routine,  MATRA. 


r 
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Appendix  A 


The  Square  Root  of  a  Matrix. 


C 

U  -  LP  is  clearly  defined  when  G  is  a 
diagonal  matrix.  If  one  makes  the  convention  that  all 
roots  are  positive.  Since  G  is  always  non-singular 
Hermitian,  it  can  in  principle  always  be  diagonalized. 

To  obtain  the  square  root  when  G  is  not  in 
diagonal  form,  use  may  be  made  of  a  formula  of  Sylvester, 
(Hildebrand,  1952,  p.  66). 

Let  ^ ~  ^  be  the  eigenvalues 

of  an  arbitrary  Hermltian  matrix  *W.  Let  P  (x)  denote 

a  polynomial  in  "X  .  Then  Sylvester's  formula  states 


that 


where 


P 


p(w)  =  £  P(Ar)2KM 

-I 


ZK(w)  = 


Tx  (W-Arl) 


In  cases  of  degeneracy,  the  limit  can  be  evaluated  by 
l'Hopltal's  rule. 

Let  5  =  W-  I 


Then 


r  (l  +  B)‘ 


The  corresponding  polynomial  is  poO- 


=  /  +  V 


_  413) 

X!  ^ 


Assuming  the  series  converges,  it  can  be  cut  off  at 
of 

A  )  some  m,  and  treated  as  a  finite  polynomial. 
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Then 


(*> 


PlB)  -  M-iB-  •  •  •  ■ 

-  p(w  - 1)  -  |  (W-l)-  ^ 


i- 

*+■  '  *  ' 


X- 

-  y  *■ 


By  Sylvester's  formula,  this  becomes 
h 


PtB)=  2P« 


rt  ;  / 


J, 

(B-X,  I 

) 

1 

T 

I 


(,-+\  ^  >  a-  (o-Vl) 
k  1  T  (An-Xr) 


where  now  the  X  ^  are  the  eigenvalues  of  W—  I 

Hence  to  evaluate  G  we  need  the  eigenvalues  of  G  ~ X.  , 

a  relatively  simple  process  when  G  is  2x2. 


Appendix  B 


A  brief,  heuristic  proof  that  G  =  T* 

Let  ?„  =  2  &K  ~^h-K 
K 

Then 

a  ,  %)  =  SfA  (V. ,  xp  J  «// 

=  2£<R.., 


^  K 
•ah- 


\-k-p  +/)e  -T-  ,  ;  © 


a,?P) 


Hence 


is  the  Fourier  Transform  of 


But 


~  ‘"*w  1  A  UttUA  Vi  III  v*.  , 

CE^e-">(e‘e)(^/e^) 

(V?,)  =  Lg  .  Therefore,  the  Fourier 
•  '  ^  h  p 

Transform  of  (?h  .  -?P)  -  6  •  Hence  by  Fourier's  theorem, 
,  -  (%  /!  p~'u'S>n\S  Ha  P~ 

a 


G  --  (I^e~^2(e; 


/ 


=  Y  §Y 


* 
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